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8. INTRODUCTION 


This report is a continuation of Part I [6] where we have presented an implementation of 
the look-ahead Lanczos algorithm for general non-Hermitian matrices A. 

The purpose of the present paper is twofold. First, we show how the look-ahead Lanc- 
zos process — combined with a quasi-minimal residual (QMR hereafter) approach can 
be used to solve large sparse non-Hermitian linear systems Ax = b. An implementation of 
the resulting QMR method is discussed in detail. The QMR approach was first proposed for 
the special case of complex symmetric matrices in [5]; for further properties of the method 
for general non-Hermitian matrices, we refer the reader to [7]. Secon , we repor numeric 
experiments with our implementation of the look-ahead Lanczos algorithm, both for eigen- 
value problems and linear systems. Also, program listings of FORTRAN implementations 
of the look-ahead Lanczos algorithm and the QMR method are included. 

Note that we continue the numbering of Part I [6] of this paper. Hence, the numbers 
1, 2, . . . , 7 refer to sections in Part I. 

The outline of Part II is as follows. In Section 9, we describe the basicidea of the 
QMR approach. In Section 10, details of an actual implementation of the QMR algorit m 
are given. In Section 11, we discuss some of the properties of the QMR method. It is 
shown, for example, how iterates of the biconjugate gradient algorithm (BCG hereafter) 
can be obtained - when they exist - from the QMR algorithm. In Section 12 we briefly 
discuss how to incorporate preconditioning into the QMR method and describe two precon- 
ditioners which we have used for our numerical tests. In Section 13, numenc^ examples 
are presented. In Section 14, we make some concluding remarks. Finally, FORI RAIN 
programs are listed in an Appendix. 

Throughout this paper, A denotes a complex, in general non-Hermitian, NxN matrix. 

For given non-zero starting vectors, m € C N and w x € C N , we denote by t>„ and w n , 
n = 1 2 ,... the vectors (normalized to have unit length) generated by the look-ahead 
Lanczos method described in Part I [6]. Generally, we use the same notation as introduced 

in Sections 2 and 3. Hence 

V'( n ) = [t) 1 U 2 6„] = [Vl ^2 

811(1 = [til u>2 ... «>»] — [Wi ^2 ••• ^*1- 

Here Jb = Jfc(n) is the number of the block containing the vector v n . Recall (cf. (2.1) and 

(2.19)) that the Lanczos vectors span the Krylov subspaces 

K n (t>i , A) := span{t>i , Av x , . . . , A w_1 v x } , 

K n (v>i,A T ) := span{t0i,A T u>i,...,(A T ) t»i}, 


K.(vi,A) = {V w * | * € C*} , 
K„(w u A T ) = {W ,( " ) * I 2 e C "} • 
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( 8 . 2 ) 

(8.3) 



By (2.20), left and right Lanczos vectors corresponding to different blocks are biorthogonal, 

i.e., 

WfVi = 0, j ? l (8.4) 

Furthermore, in view of (3.7), 


Wf Vi is nonsingular for complete blocks /. 
Finally, we will make use of the first relation in (3.3): 


AV {n) = y< n )# (n) + [0 
Here, by (3.5) and (3.6), 


#(») := 


di $ 2 0 


72 <*2 

0 

m * 

• • 

L o 


0 Pn+l^n+l ] . 


0 


■*. 0 

h 

0 7* out 


( 8 . 5 ) 


( 8 . 6 ) 


(8.7) 


is a n x n block tridiagonal matrix with diagonal and sub-diagonal blocks of the form 



* » • • • • • • • • ♦ 


1 

s 

o 

o 

k 


Pn ,+ 1 '• : 


: **. 0 

dj = 

0 Pn*+ 2 '• : 

• • « * " 
• • • • • 
. • • • . 

7/ = 

• > • 
• * * 


„ 0 — 0 1 *- 


o • • • • * • *•’ 0 


( 8 . 8 ) 


Moreover, we have set 



j — 2, 3,. 


(8.9) 


Note that J7<"> and all the blocks o, are upper Hessenberg matrices with positive subdi- 
agonal elements. 
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9. THE QUASI-MINIMAL RESIDUAL ALGORITHM 

We are interested in using the look-ahead Lanczos algorithm to solve linear systems 

Ax = b. (9-1) 


Here, b € C N is some given right-hand side. Furthermore, it is always assumed that A is 
nonsingular. 

Given any initial guess x 0 € C N for the exact solution A -1 6 of (9.1), we will construct 
iterates x„, n = 1, 2, . . ., such that 


x n exo + K n {ro,A). (9-2) 

In the sequel, r n = b-Ax n will denote the residual vector corresponding to the nth iterate 
x„. We choose the initial residual = r 0 as one of the two starting vectors for the look- 
ahead Lanczos algorithm. Then, in view of (8.2), the right Lanczos vectors • • • ,v n 

span K n (r 0 ,A), and hence we have the parametrization 

x n = x 0 + V (n) z, z €E C", (9-3) 


for all possible iterates (9.2). Note that the second starting vector, u>i € C N , is still 
unspecified. Due to the lack of a criterion for the choice of u>i, one usually sets wi - r 0 m 
practice. 

Next, letting 



' H <»> ' 

.Pn+ \tl \ ’ 



0 1 ], 


(9.4) 


we can rewrite (8.6) as 

AV r(n) = V r(n+1 ^^" ) . (9- 5 ) 

By (9.5), the residual vectors corresponding to (9.3) satisfy 

r„ = ro - = r„ - = V r( ~ f,) (3<” ) - %’>*) , C9-6) 


= [ Pi 0 • • • 0] T e R n+1 with Pi = ||r 0 || . 

We introduce an (n + 1) x (n + 1) diagonal matrix ft (n) = diag(wi,w 2 , . . .,w„ + i), > 0, 

to serve as a free parameter that can be used to modify the scaling of the problem. In 

our numerical experiments, the simplest scaling fl (n) = I«+i gave satisfactory results. 
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However, better strategies for choosing might be possible, and therefore we have 
formulated the QMR approach with a general scaling matrix & n K With it, (9.6) reads 


r n = v r(n+1) fi (n) (p (n) - tf e (n) z) 

= V'<" +1 > («<*>) “ - SI , 


(9.7) 


where 

Ideally, we would like to choose z € C n in (9.3) such that ||r n || is minimal. However, 

since in general V (n+1) is not unitary, this would require 0(Nn 2 ) work, which is too 
expensive. We will instead minimi ze just the Euclidean norm of the bracketed terms, t.e., 

we will choose z = z (n) € C n as the solution of the least squares problem 


£(") _ ft( n )#( n )*( n ) 


min n ||/9 (n) - fi (n) H e (n) z||. 
z € C 


(9.8) 


Note that, by (8.7-9) and (9.4), H ( e n) and fi (n) Hi n) are (n + 1) x n matrices with full 

column rank n. This guarantees that the solution z (n) of (9.8) is unique. Hence, (9.8) 
together with (9.3) define a unique nth iterate x n . In view of the minimization property 

(9.8), we refer to this iteration scheme as the quasi-minimal residual (QMR) method. 

For the solution of the least squares problem (9.8), we use the standard approach (see, 
c.g., [8, Chapter 6]) based on a QR decomposition of 



' RW ' 

0---0 
» * 




(9.9) 


Here, Q (n) is a unitary (n + 1) x (n + 1) matrix, and R (n) is a nonsingular upper triangular 
n x n matrix. Inserting (9.9) in (9.8) yields 


min 0 ( "> - ft (n) H< n) z 
z € C" 11 


nun 

zee" 


nun 
z € C" 


nun 
z € C n 


p-> -(<?<->)" ["'."p 

(<5 < " > £ < " ) - [u^'q 
|q < "¥" ) - [, 


' RW ' 


o-o 

Z 


(9.10) 
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Hence, z (n) is given by 


(") = (^"^) ’ w ^ ere = 


Tn 

Lf n +1 


= Q(")p( n )' ( 9 . 11 ) 


Furthermore, we have 



fi (n)^r(n) 2 (n) 


= l^n+ll- 


( 9 . 12 ) 
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10. IMPLEMENTATION DETAILS 


In this section, we describe in detail the actual implementation of the QMR algorithm. 
We break the discussion in two logical blocks: first, updating the QR decomposition of 

Q,( n )ffg n \ and second, updating the QMR iterates x n . Many of the details of updating 

the QR decomposition of can also be found elsewhere (see, e.g., [8, Chapter 12]); 

they axe included here for completeness. Furthermore, we remark that the approach for 
updating the iterates x n is based on a technique which was first used by Paige and Saun- 
ders [11] in connection with their SYMMLQ and MINRES algorithms for real symmetric 
matrices. 

The QR decomposition (9.9) of f is computed by means of Givens rotations, 

taking advantage of the fact that fi (n) ^ n) is an upper Hessenberg matrix. We use Givens 
rotations of the form 


G n = 


In - 1 0 0 

0 c n Sn 

0 -3^ c n 


, with c„ G R, s n G C, + |s n | 2 — !• (10-1) 


Let 


h = 


x 

a 

6J 


G C n+1 , 


^ 0 , 


be a given vector. Then, by choosing 


c„ = . 3^ = c„ if a ^ 0, 

M + 1 ? 

c„ = 0, sZ = 1, if a = 0, 


( 10 . 2 ) 


in (10.1), we obtain a Givens rotation G n which zeroes out the last element of h 



Here ‘x’ denotes elements that are left unchanged by the rotation. Clearly, G n is unitary; 
furthermore, products of Givens rotations are also unitary. In particular, 


Q (n) = G n 


G n - 1 0 

0 1 


G\ 0 

0 In-l 


(10.3) 
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is unitary. In general, then, the QR decomposition of can be computed as follows. 

After the nth column of H[ n) is built, one first premultiplies it by ft (n) , and then by all 
the previous Givens rotations, thus computing 


h = G n - 1 



0 

1 


G i 0 

0 I n - 1 



One then computes G„ from (10.2) with a = h n , b = h n +i (= u; n +i/>„+i), and finally 
obtains the nth column of R (n) by applying G„ to h. For later use, we notice that 


(* w ), 


= W„+ip n +l 


(10.4) 


which is readily verified by means of (10.2). 

In our case, is also block tridiagonal, which means that not all the previous 

Givens rotations have to be applied. Let n G denote the index of the first Givens rotation 
that has to be applied. Then 


n G 


■{ 


max (n*_i - 1, 1) if v n is an inner vector, 
max (n*_ 2 -1,1) if v n is a regular vector. 

Finally, note that if n G > 1, then applying G„ c to ^ mtroduce a 

non-zero element in the n G position. 

Once the QR decomposition is updated, one updates the vector t (n) in (9.11) by setting 


t (n) = G n 


t<— ») 1 
0 


(10.5) 


Clearly, by (10.1), i (n) differs from t (n_1) only in its last two entries which are given by 

t„ = c n f n and f„+i = —s n ■?»»• (10.6) 

Next, we show that this newly introduced element f„+i can be used to obtain an 
upper bound for the norm of the residual r n . First, note that, in view of (9.7), 


|r„ll = 


v<"+» (ti*"*) -1 (0< n > - 
v^\\ ||(n<"))’ 1 || 


(10.7) 
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Now 


n+l) 


< VnTT, 


since V r ^ n+1 ^ has n + l columns of Euclidean norm 1, and 

| =max *' = l,...,n + l, 

as is diagonal. Combining (10.7-9) and (9.12), we get 

IMI < Vn+ 1 |f»+il max (i) . 


( 10 . 8 ) 


(10.9) 


( 10 . 10 ) 


Let us now turn to updating the iterates x„. We will update x n only when the current 
block k is closed, t.e., when 

n = n*+i-l. (10.11) 

This means that the QR decomposition of has been updated as described above, 

and that {t^) Vn will hereafter remain unchanged. Inserting (9.11) in (9.3) yields 

x n — xq + V (n >z (B) = x 0 + V (B) (r ( b) ) _1 (/ (n) ) 

V / \ /!:„ (10.12) 

= x„ + /><"> (<<”>) i n 

where we have introduced the matrix of direction vectors 

p(n) = y(n) = |p i p 2 p* j (10.13) 

Here the partitioning of P (n) into submatrices on the right-hand side of (10.13) is similar to 
(8.1): the matrices P< contain the direction vectors corresponding to block 1, / = 1, 2, . . . , k. 
Then, with (10.11-13), we arrive at the update formula 

x„=x„ 4 _ 1 +P fc (t (n) ) , n = n*+i — 1. (10.14) 

\ / fifcin 


It remains to show how to compute Pk- To this end, we first note that 



(10.15) 
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where Z k is a 1 x h k block, Y k is a h k - i x h k block, and R k is a h k x h k block. Recall that 
h k -\ — n k — n k - 1 and h k = n* + j — n k denote the sizes of blocks k — 1 and k } respectively. 

The block structure of R (n) in (10.15) is a consequence of the block structure (8.7) of (n) 

and of the fact that the fill-in from the QR decomposition of ft (n) J?e (n) is limited to the 

row above each block $i, l = 2,3, . . . ,k. 

To save work, we compute and store the vectors P k R k , as opposed to computing and 
storing P k . Rewriting (10.15) in terms of the stored vectors, we obtain the update formula 

= Zt (10.16) 

Moreover, instead of (10.14), we actually use the update formula 

x„ = x„,-i + (P t R, )(«»)'* (< < " ) ) niin . " = "*+1 - !• 

Finally, a note about computing the middle term in (10.16). If one computes in order from 
left to right, *.e., 

(( P k -2R k -2)(Rk-2 ) Zk ’ 

then the work involved is 0{Nh k - 2 + Nh k ). If one computes in order from right to left, 
i.e., 

(P*- 2 R»-2) ((JU-s)" 1 0 z t °])> 

then the work involved is 0(h k h k -2 + Nh k h k - 2 ). This means that if h k = 1 or h k - 2 — 1, 
it is cheaper to multiply from right to left; otherwise, it is cheaper to multiply from left 
to right. In both cases, one takes advantage of the fact that Z k is a row vector, which 
means that only the last column of the matrix premultiplying Z k has to be computed. In 

particular, only the last column of (R k - 2) -1 is needed and has to be stored. 
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11. FURTHER PROPERTIES OF THE QMR ALGORITHM 


In this section, we derive an update formula for the QMR residual vectors. Moreover, it 
is shown that BCG iterates can be easily recovered from the QMR process. 

In the QMR algorithm described in the last section, neither the residual vectors r„ 
nor their norms ||r n || are generated explicitly. Instead, the upper bound (10.10) for |Jr n || 
is available. In our implementation, we monitor this upper bound and switch over to com- 
puting the true residual vector and its norm only in the last few iteration steps. Another 
possibility would be to update the residual vector in each step. The following proposition 
shows that this can be done at the cost of one additional SAXPY per step. 

Proposition 1. For n = 1,2, . . . : 

r„ = M„| J r.-i + £ ^ (1U) 

W„+l 


Proof. By inserting z = from (9.11) in (9.7) and using (9.9), we obtain 


T n = 'Tn+iyn+l 


where 


y„ + i = V' < " +1) («<“>) "' (<?<”>) " 



Now note that, by (10.3) and (10.1), 



O' 


0 0 ■ 

($(«-!))* | 


In-1 : : 

0 0 

0 


0 • • • 0 c n -s n 

.0 0 1. 


.0 0 3^ c n . 


( 11 . 2 ) 


(11.3) 


By means of (11.3), one readily verifies that two successive vectors y„+i and y n in (11.2) 
are connected by 

y »+ 1 = -s n y n H (U- 4 ) 

Wn+l 

Finally, by inserting (11.4) in (11.2) and using the second relation in (10.6), we obtain 

( 11 . 1 ). □ 

Next we turn to the BCG method [9] and show its connection with the QMR algorithm. 
In order to distinguish quantities from the two approaches, superscripts BOG and QMR 
will be used. In the sequel, it is always assumed that BCG and QMR are started with the 
same initial guess xq. 
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In the BCG approach, one aims at computing iterates xf* 00 which are characterized 
by the Galerkin type condition 

w T (b - Ax^ 00 ) = 0 for all wEK n (w\,A T ), x^ 00 E x 0 + K n {ro,A). (11-5) 

(see, e.g., [13]). Unfortunately, such an iterate need not exist for every n. This is one of 
the two sources for possible breakdowns which can occur in the classical BCG algorithm. 

Next, we rewrite the condition (11.5) in terms of quantities generated by the look- 
ahead Lanczos algorithm. For this purpose, the same parametrization as in (9.3) is used: 

*f» = x 0 + V (n) u<">, u<"> G C". 

Then, in view of (8.6), the corresponding residual vector satisfies 

r^ OG = b- Ax? 0 = r 0 - 4V (n) u<"> 

= yM (-/") _ £<»>„<»>) _ Pn+l („<">)__ 6 n +i, (11.6) 

where 7^"* = [ Pi 0 0 6 R n , Pi — ll r o||- 

Next, inserting (11.6) in (11.5) and using (8.3), we can rewrite (11.5) as follow: 

(iy(")) T U(") ( 7 < n > - £<">«<">) = Pn+i (« (n) ) n {w^) T v n+ 1 , (11.7) 

xf* = x 0 + V (n) « (n) . 

In analogy to the QMR algorithm, we will attempt to recover the BCG iterate only 
when the current block k is closed. Thus, in the sequel, it is always assumed that n = 
n* + i — 1, (cf. (10.11). This guarantees that, by (8.5) and (8.4), 

•p’( n ) i 8 nonsingular and v n +i = 0, (11-8) 

respectively. Finally, with (11.8), the equation (11.7) reduces to 

£(«)„(«) = 7 <»>. (11.9) 

By means of (11.9), we can now derive a simple criterion for the existence of the nth 
BCG iterate. In the following proposition, cj and Sj, j = 1,. . . ,n, denote the elements of 
the jth Givens rotation used in our implementation of the QMR method, as described in 

Section 10. Moreover, p„ is the last column of the matrix P (n) introduced in (10.13). 
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Proposition 2. Let n = njt+i — 1, k — 0, 1, . . .. Then the following are equivalent: 
(i) the BCG iterate x%°° defined by (11.5) exists; 

(ii) is nonsingular; 

(iii) c„ ^ 0. 

Moreover, if x^ 00 exists, then 


x« x = i« MR + r - |s 2 " |2 p„, 

(11.10) 

rf» = -^ +1 (u<")) n e„ +1 . 

(li.ii) 

Ikf^ll = ll r 0 II * I s l 5 2 ‘ • ■S„-lSn| Wn+lCn ' 

(11.12) 


Proof. Clearly, an nth BCG iterate exists if, and only if, the linear system (11.9) has a 
solution. Recall that, by (11.6), 7 (n) is a nonzero multiple of the first unit vector, and 

that H is an upper Hessenberg matrix whose subdiagonal elements are all nonzero (by 
(8.9)). Hence, the extended coefficient matrix [ 7 (n) H {n) ] of (11.9) has full row rank n. 

Consequently, (11.9) has a solution if, and only if, H {n) is nonsingular. This shows the 
equivalence of (i) and (ii). 

Next, using (9.9), (9.4), (10.1), and (10.3), one readily verifies that 


Q(n-l)fl(n-l)£(n) 


/«-! 0 
0 c„ 




(11.13) 


This relation implies that (ii) and (iii) are equivalent. 

Now assume c n ^ 0. With (11.9) and (11.13), it follows that 


X™ = xo + 
u (n) - (R (n) ) _1 


In-l 0 

0 l/c n J 


Q(n-l)fl(n-l) 7 ( n ) 


(11.14) 


Recalling the definitions of 7 (n) and 0 (n) in (11.6) and (9.6), respectively, and using (9.11), 
we can rewrite (11.14) as follows: 


xf» = x 0 + V<"V”> where u<“> = («<”>) ‘ 


(11.15) 


By comparing (11.15) with (10.12), we obtain the relation 


-BOG _ X QMR 
x n ~ *n 


+ 



Pn 
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which, in view of (10.6), is just (11.10). Equation (11.11) follows by inserting (11.9) in 

( 11 . 6 ). 

Finally, from (11.11) and (11.15), we deduce that 


Pn 41 

Tn 

Cn 



where £„ = 


(11.16) 


Note that, in view of (10.4), 


Pn+l = 


|^n£n | 

W„+i 


(11.17) 


Moreover, by (10.6) and since T\ — u>i ||ro||, 

|r»| = wi 11 ^* 0 1| • |«i ^2 (11.18) 

By inserting (11.17) and (11.18) in (11.16), we get (11.12), and this concludes the proof. 0 
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12. PRECONDITIONING 

As for other conjugate gradient type methods, for solving realistic problems, it is crucial to 
combine the QMR algorithm with an efficient preconditioning technique. In this section, 
we show how to incorporate preconditioners into the QMR code. Also, we briefly describe 
two preconditioning techniques. 

Let us first treat the problem in some generality. Suppose that we replace the original 
linear system (9.1), Ax = 6, by an equivalent linear system 

Ay = 6, (12-1) 

where A, y, and b are defined suitably to guarantee the equivalence between the two 
systems. Typically, 

A = f A (A), y = /*(*)> ~ b = M b ) 

for some functions f A , /*, and /*. The goal in replacing (9.1) with (12.1) is to replace 
a “hard” system by an “easier” system. In particular, one attempts to ensure that the 

system (12.1) is easier to solve than (9.1), which translates into having a matrix A which is 
more suitable for some solver that A is, and having a function /* which is easily invertible. 
The QMR algorithm for the new system (12.1) is then as follows: 

Given an initial guess xq and a nonzero vector w \ , set 


yo = fz(xo), 
fo = b — A yo, 

Pi = INI > 

1 - 

V\ = — ro- 

Pi 

Iterate with A to compute* 

AV {n) = V (n+1) £i n) , 

A T W (n) = W {n+1) H[ n) . 

When appropriate, compute the iterates 

v„ = Vo + <*”>*. 


where, given 

f„ = b - Ay„ = l - Avo - = f o- 


_ y("+ 1 ) 


-1 


Ul P\ 

0 


_ ft( n+1 >ir< n) z n 




) 
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z n is chosen to minimize 



1 

E 

" 

t— » 

1 

- fl( n+1) £ e (n) z„ 


0 . 



When appropriate, compute the iterates 
In =/r _1 (yn) 


Note that the term being minimized is part of f n ; one could attempt to correlate this 
with r n . 

One approach in preconditioning is the following. Given a preconditioner matrix M , 
with M « A in some sense, one uses a decomposition of M into 

M = M\ M 2 

to precondition the original system Ax = 6, by setting 

A = Mf 1 AM 2 -1 , y = M 2 (x - x 0 ), b = Mf 1 (6 - Ax 0 ). (12.2) 

Clearly, with (12.2), (12.1) is equivalent to the original system (9.1). Furthermore, given 
iterates y n generated for (12.1), one can recover Xn and r n for (9.1) through 

X n = Xo + 1 1/ri) r n = M\fni 

with y 0 = 0 and r 0 = Mf 1 ^. By setting = I or Af 2 = J, one obtains right or left 
preconditioning, respectively, in which cases the above formulas simplify somewhat. In 
general, however, for the QMR algorithm applied to a preconditioned system, one has to 

be able to compute Mf 1 *, M~ T z, M^'z, and M 2 “ T z, for arbitrary vectors z. 

Two examples of preconditioners that fall in the category described above are the 
SSOR and the ILUT preconditioners. We briefly discuss them and our particular imple- 
mentations. 

• SSOR 

The SSOR preconditioner is based on a decomposition of the matrix A into a non- 
singular diagonal matrix D, a strictly lower triangular matrix L, and a strictly upper 
triangular matrix U, such that 

A = D + L + U. 

Note that D might have to be block diagonal to ensure it is nonsingular. For this precon- 
ditioner, one sets M x = I or M 2 = J, depending on whether right or left preconditioning 
is desired. The preconditioner matrix M is given by 

Mssor = (D + uL)D 1 (D + 
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which gives 


Mssor = ( D + uU)- l D{D + uL )~ l , 
for some parameter u>. We take u = 1. 

• ILUT(Jfc) 

The Incomplete LU decomposition is based on the LU decomposition of the coefficient 
matrix A. The full LU decomposition of A would result in factors L and U which, in 
general, have far more nonzero elements than A. The incomplete LU factorization aims to 
reduce this additional fill-in in the factors L and U > 

In ILUT(k), we use the following strategy for dropping non-zero elements which would 
fill-in L and U. Each row of L and U is constructed subject to the restriction that only 
a small amount of fill-in, k more elements for each, is allowed beyond the number of 
elements of A already present in that row (in the lower and upper part, respectively). 
Furthermore, elements which are deemed to make only an insignificant contribution to the 
decomposition are also dropped. For example, this means that if LENL is the maximum 
number of elements allowed for some row of L, K is the actual number of elements of 
that row computed by the elimination process, and TOL is the cutoff tolerance, then the 
algorithm orders the K elements in decreasing order of magnitude, and keeps only up 
to min (K, LENL) elements, or until the elements reach the level TOL , whichever cutoff 
comes first. The resulting matrices L and U are then used as Mi = L, M 2 = U. 

Note that the variant of ILU used is different from the standard one. For a symmetric 
matrix A, the standard ILU preconditioner [10] preserves the sparsity structure of the 
matrix, i.e., for k = 0, the preconditioner matrices have non-zero elements only in those 
locations where A itself has non-zero elements. In [10] it is shown that this strategy does 
produce a good preconditioner, provided that A is a symmetric A/-matrix. For a general 
nonsymmetric matrix, there is no reason to preserve the sparsity structure of A. Since we 
are mainly concerned with nonsymmetric matrices, our implementation discards elements 
subject only to the constraints of fill-in and size, without regard to the sparsity structure 
of A. However, this does mean that if A is symmetric, we do not recover the standard ILU 

preconditioner. 
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13. NUMERICAL EXPERIMENTS 


In this section, we present some numerical examples obtained with our FORTRAN code. 
We show both eigenvalue problems, which involve only the Lanczos algorithm, and non- 
symmetric linear systems, which also involve the QMR algorithm. We also indicate how 
many blocks of size bigger than 2 the algorithm built during each run. We only report 
blocks that were successfully closed; blocks that were rebuilt after updating the estimate 
for the norm of the matrix are not counted. Unless otherwise indicated, the elements of the 
starting vectors t»x and u>i were random numbers from a normal distribution with mean 0.0 
and variance 1.0. Also, in all cases, the user-supplied estimate for the norm of the matrix 
was set to 1, thus forcing the algorithm to estimate the norm on its own. For eigenvalue 
problems, we use the heuristic presented in [2] to identify and eliminate spurious eigenval- 
ues. In the plots, we show the true eigenvalues of the matrix (denoted by as compared 
to the computed Lanczos eigenvalues (denoted by ‘o’). For the linear systems, the starting 
guess x 0 was always zero and the convergence requirement was a reduction by a factor of 
10 — 6 in the norm of the residual. We always used the QMR algorithm with no scaling, 
i.c., ft (n) = /„+ 1 in (9.7), and we preconditioned the systems as discussed in Section 12. 
For the SSOR preconditioners, we always used w = 1.0; for the ILUT preconditioners, we 
used ILUT(O) and ILUT(4), with the threshold set to TOL = 0.001. While we only show 
results for the right SSOR preconditioner, we also ran the left SSOR preconditioner, and 
in all cases, the number of iterations needed to converge was roughly the same as for the 
right SSOR preconditioner; we prefer the right preconditioner because its residual vector is 
identical to the residual vector of the unpreconditioned system. In the plots, we show two 
convergence curves: the top curve (solid) is the upper bound (10.10) used in the algorithm 
for the residual norm, while the lower curve (dotted) is the computed residual norm. In 
practice, one would monitor the upper bound until it reached the convergence range, and 
then possibly switch to computing the true residual norm until convergence. The vertical 
scale is the same on all the convergence plots, but the horizontal scale usually changes 
from one plot to the next. Finally, unless otherwise indicated, all the examples were run 
on a Sim Sparc Station 1, in double precision. 


Example 1. The first example is taken from [1]. We include it here to make the point 
that, as indicated in Section 2, singular and deficient polynomials do indeed occur. We 
have 


A = 


■ 0 0 

1 0 

0 1 

0 0 

0 0 

0 0 

0 0 

. 0 0 


0 0 0 0 
0 0 0 0 
0 0 0 0 
10 0 0 
0 10 0 
0 0 10 
0 0 0 1 
0 0 0 0 


0 

-1' 


-1- 


o- 

0 

0 


1 


0 

0 

0 


0 


1 

0 

0 


0 


0 

0 

0 

, v i = 

0 

, t»l = 

0 

0 

0 


0 


-1 

0 

0 


0 


0 

1 

0. 


.0. 


. 0. 


With these starting vectors, the algorithm builds a 2 x 2 block, followed by a 4 x 4 block, 
followed by 1 x 1 blocks. The 2nd and 4th degree polynomials are singular, while the 
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5th, and 6th degree polynomials are deficient. After 8 steps, the algorithm finds invariant 
subspaces with respect to both A and A T , and the computed eigenvalues match the true 
eigenvalues exactly, as shown in Figure 1. 

Example 2. This example is an eigenvalue problem, taken from [12], whose exact eigen- 
values are known. Generally, problems of this type arise in modeling concentration waves 
in reaction and transport interaction of chemical solutions in a tubular reactor. The par- 
ticular test problem used here corresponds to the so-called Brusselator wave model. We 
took N = 200 and computed eigenvalues after 20 (Figure 2) and after 100 (Figure 3) 
Lanczos steps. For the latter case, the algorithm built 2 blocks of size 2. As expected, the 
extreme eigenvalues converge rapidly, while the interior ones require more iterations. For 
this example, the Lanczos algorithm computes all the eigenvalues after 200 steps; however, 
it requires as much work as the direct computation of the eigenvalues would require. 

Example 3. This example comes from a problem in hydrodynamics. The aim is to 
model a 2-D wave sloshing in a tank using integral equations and Green’s theorem. The 
approach used is to transform the elliptic mixed boundary-value problem into boundary 
integrals, leading to mixed first and second kind equations, and then to discretize the 
integral equations. This is done at every time step. It yields a dense nonsymmetric 
matrix which is used to solve a system for the velocity at the boundary points. Once the 
velocity is known, the free boundary is then advanced in time. The routine to generate 
the matrix was provided by Dick Yue and Hongbo Xu from the MIT Department of Ocean 
Engineering. We used the matrix as it arises at a time when the wave is beginning to 
overturn (see Figure 4); the right and bottom walls are fixed, the left wall is a wave-maker 
which produced the wave. In Figure 5, we show the convergence results for N = 640, 
with the system preconditioned with the right SSOR preconditioner; the algorithm built 2 
blocks of size 2. In this example, the right-hand side vector was prescribed by the physics 
in the problem. 

Example 4. This example is taken from [3]. The following partial differential equation 
models heat conduction in a box, where a side is heated and the flow is rotating. 

-Au + [t), v y t>*]-(Vti) = 0 on (— 1,1) x (— 1,1) x (— 1,1), 

f 100 if z = -1, ( 131 ) 

with Dirichlet boundary conditions u = < 

1 0 otherwise. 

Here, 

v x = C p C 0 yz(l-x 2 ) 2 (l-y 2 )(l-z 2 ), 

vy = -C,Cixz{ 1 - y 2 ) 2 (l - * 2 )(1 - z 2 ), 
v t = —C p C\xy{\ - z 2 ) 2 ( 1 - * 2 )(1 - y 2 ), 

Co = 27/2, Ci = Co/2, 
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and C p > 0 is a free parameter. We discretize (13.1) using centered differences on a uniform 
m x m x m grid with mesh size h = 2/(m + 1). The resulting linear system has a sparse 

coefficient matrix A of order N = m 3 . For our experiments, we have chosen the parameter 
C p = 1/h. Note that this choice guarantees that the cell Reynolds number is smaller than 

one, and hence centered differences yield a stable discretization of (13.1). In this and all 
subsequent sparse examples, we have relied heavily on routines from SPARSKIT [14] for 
the basic matrix operations. In Figure 6, we show the convergence results for m = 31 
(N = 29791), with the system preconditioned with the right SSOR preconditioner; the 
algorithm built only lxl blocks. The right-hand side vector was chosen to describe 
a physically possible solution. This example was run on a Cray-2 at the NASA Ames 
Research Center. 

Example 5 and 6. Finally, Examples 5 and 6 were taken from the Harwell-Boeing sparse 
matrix collection [4]. On these examples, we ran both the SSOR preconditioners, and two 
versions of the ILUT preconditioner, namely ILUT(O) and ILUT(4). Example 5 is the first 
matrix from the OILGEN coUection, called ORSREG 1. It comes from an oil reservoir 
simulation on a 21 x 21 x 5 full grid; the order of the matrix is 2205, and it has 14133 non- zero 
elements. In Figure 7, we show the convergence curves for the right SSOR preconditioner, 
in Figure 8, the convergence curves for ILUT(O), and in Figure 9, the convergence curves 
for ILUT(4). The algorithm built one block of size 2 for the SSOR preconditioner, one 
block of size 2 and one of size 3 for the ILUT(O) case, and no blocks of size bigger than 
1 for the ILUT(4) case. Example 6 is the fifth matrix from the SHERMAN collection, 
called SHERMAN 5. It comes from a fully implicit black oil simulator on a 16 x 23 x 3 
grid, with three unknowns. The order of the matrix is 3312, and it has 20793 non-zero 
elements. In Figure 10, we show the convergence curves for the right SSOR preconditioner, 
in Figure 11, the convergence curves for ILUT(O), and in Figure 12, the convergence curves 
for ILUT(4). For this case, the convergence curves for the two ILUT preconditioners are 
almost identical. The algorithm built one block of size 2 for the SSOR preconditioner, 
three blocks of size 2 for the ILUT(O) case, and one block of size 2 for the ILUT(4) case. 
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Figure 1. Eigenvalues of Example 1. 



Figure 2. Eigenvalues of the Brusselator example after 20 steps. 
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Figure 12. QMR convergence curves for Example 6, ILUT(4). 
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14. CONCLUDING REMARKS 


We have proposed a robust iterative solver for non-Hermitian linear systems. Based on the 
look-ahead Lanczos algorithm described in Part I [6] of the paper, the method generates 
iterates which are characterized by a quasi-minimal residual (QMR) property. The QMR 
approach is closely related to the biconjugate gradient (BCG) algorithm; however, unlike 
BCG, the QMR algorithm has smooth convergence curves and good numerical properties. 

For the case of real nonsymmetric matrices A, we have FORTRAN implementations 
of the QMR method and the underlying look-ahead Lanczos algorithm. These programs 
are listed in the Appendix. These codes are available electronically from the authors 
(na.freund@na-net.stanford.edu or na.nachtigad@na-net.stamford.edu). 


Acknowledgements. The authors would like to thank Youcef Saad, who provided us 
with routines for generating test Example 2, and Dick Yue and Hongbo Xu for generating 
the matrices for test Example 3. 


26 


REFERENCES 


[1] Boley, D., Elhay, S., Golub, G. H., and Gutknecht, M. H. Nonsymmet- 
ric Lanczos and finding orthogonal polynomials associated with indefinite weights. 
Numerical Analysis Report NA-90-09, Stanford, August 1990. 

[2] Cullum, J., AND WILLOUGHBY, R. A. A practical procedure for computing eigen- 
values of large sparse nonsymmetric matrices. In Large Scale Eigenvalue Problems , J. 
Cullum and R.A. Willoughby, Eds., North-Holland, 1986, 193-240. 

[3] DOI, S., AND LlCHNEWSKY, A. Some parallel and vector implementations of pre- 
conditioned iterative methods on Cray-2. International Journal of High Speed Com- 
puting 2 (1990), 143-179. 

[4] Duff, I. S., Grimes, R. G., and Lewis, J. G. Sparse matrix test problems. 
ACM Trans. Math. Softw. 15 (1989), 1-14. 

[5] Freund, R. W. Conjugate gradient type methods for linear systems with com- 
plex sy mm etric coefficient matrices. Technical Report 89.54, RIACS, NASA Ames 
Research Center, December 1989. 

[6] Freund, R. W., Gutknecht, M. H., and Nachtigal, N. M. An implementation 
of the look-ahead Lanczos algorithm for non-Hermitian matrices, Part I. Technical 
Report 90.45, RIACS, NASA Ames Research Center, November 1990. 

[7] FREUND, R. W., AND Nachtigal, N. M. QMR: a quasi-minimal residual method 
for non-Hermitian linear systems. Technical Report, RIACS, NASA Ames Research 
Center, 1990, in preparation. 

[8] Golub, G. H., and Van Loan, C. F. Matrix computations. First edition, The 
Johns Hopkins University Press, Baltimore, 1983. 

[9] LANCZOS, C. Solution of systems of linear equations by minimized iterations. J. 
Res. Natl. Bur. Stand. (1952), 33—53. 

[10] MEIJERINK, J. A., AND VAN DER VORST, H. A. An iterative solution for linear 
systems of which the coefficient matrix is a symmetric M- matrix. Math. Comp. SI 
(1977), 148-162. 

[11] Paige, C. C., AND Saunders, M. A. Solution of sparse indefinite systems of linear 
equations. SIAM J. Numer. Anal. 12 (1975), 617—629. 

[12] PARLETT, B. N., and Saad, Y. Complex shift and invert strategies for real ma- 
trices. Linear Algebra Appl. 88/89 (1987), 575-595. 

[13] Saad, Y. The Lanczos biorthogonalization algorithm and other oblique projection 
methods for solving large unsymmetric systems. SIAM J. Numer. Anal. 19 (1982), 
485-506. 


27 



[14] SAAD, Y. SPARSKIT: a basic tool kit for sparse matrix computations. Technical 
Report 90.20, RIACS, NASA Ames Research Center, May 1990. 


28 


APPENDIX 


In this appendix we present FORTRAN codes listings for the Lanczos and QMR routines. 
We also make the correspondence between variables in the code and their name in this 
paper. Under each routine, each variable also appearing in the paper is listed as it appears 
in the code, followed by its name in the paper and an equation or a section number. The 
equation number listed is either the defining equation, if one exists, or the first equation 
where the variable appears, if a defining equation does not exist. In the latter case, the 
variable is usually defined in the text before or after the equation listed. The section 
number appears for variables which are defined in the introduction or in the first part of 
the notation section. 

The work estimates for one step of this implementation of the look-ahead Lanczos 
algorithm are as follows: 

• On the average, the algorithm builds lxl blocks, and requires: 


1 multiplication by A 

1 multiplication by A T 


2 • (n — n/s-i + 1) DAXPY operations of length N 


4 DDOT operations of length N 


• If building a pair of inner vectors, then the worst case requires: 


1 

1 

2 • (n - n fc _ i + 1) + 4 


5 

1 

1 


multiplication by A 

multiplication by A T 

DAXPY operations of length N 

DDOT operations of length N 

SVD of a (n — n *- i + 1) x (n — rik-i + 1) matrix 

(n — njt_i + l) 3 work to compute the inverse 


The additional work comes from the inner recursion (DAXPY), the SVD of a matrix of 
size bigger than 1 (and computing its inverse), and finally, the one dot product we have 
to recompute in cases when we first build a regular vector and then discover that it fails 
(4.12). In addition, when the Lanczos vectors are seeded, the algorithm performs two 
DSCAL operations of length N (we have ignored in the above 0(n — rik-i + 1) work). 
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The routines Eire organized as follows: 

• DLAL 

This routine is the lowest level routine, implementing one step of the Lanczos algorithm 
with look-ahead. The routine is called both by the eigenvalue code and by the linear 
systems’ solver. 


N = n (3) 

NK =n k (3) 

NKM1 = n*-! (3) 

NLEN = N (1) 

SC(l,i) =5^ (3.1) 

SC(2,i)=j^- (3.1) 

SC(3,i) = ff (3.1) 

‘'I 

SC(4, i) = £,- (3.2) 

SC(5, i) =<7,- (3.2) 

TETA = r) n (3.8) 

TZETA =Cn (3.8) 

VW(:,Q(i)) = * (3.2) 

VW(:,M + Q(i)) =wi (3.2) 

WK(:, 1 : M) = (W?V it) (3.7) 

WK(:, M + 1 : 2 * M) = (WfVk)- 1 (3.7) 

WK(:, 3 * M + 5) = (Wf.jVjt-i)" 1 WjH l (Ala) 

WK(:,3*M + 6) = {WjV k )~ l W^ (Alb) 


• EIGLAL 

This routine is the shell routine used in eigenvalue problems. It sets up the Hessenberg 
matrix that can be used to compute eigenvalue estimates for A, but does not actually 
compute the eigenvalue estimates. In general, some further processing is needed to 
compute the eigenvalues, and then identify and discard spurious and ghost eigenvalues. 

H =#("> 

N = n 
NK =n k 
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( 3 ) 

( 3 ) 

( 3 ) 


NKM1 =n fc _! 

(3) 

NLEN = N 

(1) 

SC(l,i) =sf5 

(3.1) 

S C(2, i) 

(3.1) 

SC(3, i) =fi 

•"t 

(3.1) 

SC(4,i) ={, 

(3.2) 

SC(5,i) = <7i 

(3.2) 

VW(:,Q(i)) = Vi 

(3.2) 

VW(:,M + Q(i)) =Wi 

(3.2) 

WK(:, 1 : M) = (W?V k ) 

(3.7) 

WK(:, M + 1 : 2 * M) = (WfVk)- 1 

(3.7) 

WK(:,3*M + 5) = (W^Vk-^W^ 

(4.1a) 

WK(:,3 * M + 6) = (W^V k )- x Wj 

(4.1b) 


• SYSLAL 

This routine is the shell routine used in solving linear system. It attempts to solve a 
linear system using the Lanczos-QMR algorithm. 


NLEN =N 

(1) 

SC(l,i) = ^=Pi 

(3.1), (8.9) 

Se(2,i) 

(3.1) 

SC(3,i) = fj 

(3.1) 

SC(4,i) =?. 

(3.2) 

SC(5, i) = a 

(3.2) 

VW(:,Q(i)) =t>i 

(3.2) 

VW(:,M + Q(i)) =w, 

(3.2) 

VW(:,2*M + 3) = b 

(9.1) 

VW(:,2*M + 4) — x n 

(10.14) 

VW(:,2*M + 5 :3*M + 4) =[P k - 1 P*] 

(10.13) 

WK(:,1:M) = {W?V k ) 

(3.7) 

WK(:, M + 1 : 2 * M) = (W^V*) -1 

(3.7) 

WK(:, 3 * M + 5) = {W^Vk-^W^ 

(4.1a) 
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WK(:,3*M + 6) 

WK(Q(i),3*M + 9) 
WK(Q(i), 3 * M + 10) 
WK(Q(i),3*M + ll) 

WK(Q(i),3 * M + 12) 

WK(Q(i),3 * M 4- 13) 

WK(:, 3 * M + 15 : 4 * M + 14) 
WK(:, 4 : M + 15 : 5 : M + 14) 


=(wTv t )-'wr 

(4.1b) 

= Wf 

(9.7) 

= c, 

(10.2) 

= Si 

(10.2) 

= (<<">). 

(9.11) 

- (*T), 

(10.15) 

=n 

(10.15) 

= Rk 

(10.15) 


• DETA 

This routine computes one of the coefficients for the inner recursion. 

DETA(i) = T/i (3.8) 


• DZETA 

This routine computes one of the coefficients for the inner recursion. 

DZETA(i) = Ci (3.8) 


• GETOMG 

This routine computes the seeding factors w,- for the QMR algorit hm . 

GETOMG(i) = ui (9.7) 
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£****★★★★********★************★**★★★******★*★**★************★ ********** 

c 

C Copyright (C) 1990, Roland W. Freund and Noel M. Nachtigal 

C All rights reserved. 

C 

C No part of this code may be reproduced, stored in a retrieval 

C system, translated, transcribed, transmitted or distributed in 

C any form or by any means means, manual, electric, electronic, 

C electo-magnetic, mechanical, chemical, optical, photocopying, 

C recording, or otherwise, without the prior explicit written 

C permission of the author (s) or their designated proxies. In no 

C event shall the above copyright notice be removed or altered in 

C any way . 

C 

C This code is provided "as is", without any warranty of any kind, 
C either expressed or implied, including but not limited to, any 

C implied warranty of merchantibility or fitness for any purpose. 

C In no event will any party who distributed the code be liable for 

C damages or for any claim (s) by any other party, including but not 

C limited to, any lost profits, lost monies, lost data or data 

C rendered inaccurate, losses sustained by third parties, or any 

C other special, incidental or consequential damages arising out of 

C the use or inability to use the program, even if the possibility 

C of such damages has been advised against. The entire risk as to 

C the quality, the performance, and the fitness of the program for 
C any particular purpose lies with the party using the code. 

C 

C ***************************************************************** 

C ANY USE OF THIS CODE CONST ITUES ACCEPTANCE OF THE TERMS OF THE 

C ABOVE STATEMENTS 

C ***************************************************************** 

c 

C** ********************************************************** ********** 

c 

C This file contains the basic routines for the look-ahead Lanczos 

C algorithm. DLAL carries out one step of the algorithm and DSCALE 

C is used to scale the Lanczos vectors. DEPS is used by DSCALE to 

C compute machine epsilon, DADD is called by DEPS to ensure that no 

C unwanted optimization takes place, and DZERO is called by DLAL to 

C zero out vectors (it is also used elsewhere in the code) . 

C 

C********************************************************************** 

c 

SUBROUTINE DLAL (NDIM, NLEN, M, N, NK, NKM1 , VW, SC, WK,Q, NORMS , TOL, TF, 

$ INFO) 

C 

C Purpose : 

C This subroutine carries out one step of the look-ahead Lanczos 
C algorithm. The matrix A is referenced solely through the external 

C subroutines AXB and ATXB, which are of the form: 

C 

C SUBROUTINE AXB (X,B) - computes B - A * X. 

C SUBROUTINE ATXB (X,B) - computes B - A A T * X. 

C 

C Most of the inputs to this routine are not checked for validity; 

C it is the resposibility of the caller to ensure that the various 

C dimensions, indices, and data are valid, and that the output unit 

C TF (when applicable) is ready for output. The only inputs checked 

C are the tolerances in TOL, which, when they are non-positive, are 

C replaced with default values as follows: 

C TOL ( 1 ) - 1/TOL (2 ) 

C TOL (2) - eps A { 1/2 } 

C TOL ( 3 ) = eps'Ml/'S) 

C TOL ( 4 ) = eps* { 1/3 ) 

C Here, eps is machine epsilon. 

C Normally, this routine is not called directly from the top-most 



o n 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


level, but rather from an intermediate routine that uses Lanczos 
to solve either eigenvalue problems or linear systems . The caller 
initializes the following: 

VW(:,Q(1)) - the first Lanczos vector V_1 

VW(:,M+Q(1)) - the first Lanczos vector W_1 

SC(:,1) * the scaling factors for V_1 and W_1 

WK (1,1) - the dot product W_1^T V_1 

Q - the array of wrapped indices 

NORMS (1) - the initial estimate for the norm of A 

T0L(1:4) - tolerances (optional) 

Thereafter, it is usually left up to this routine to update the 
variables used. Upon exit, N, NK, NKM1, VW, SC, WK, NORMS, TOL, 
and INFO might be changed. 

Parameters : 

NDIM >= the dimensioned size of the array VW (input) . 

NLEN - the actual size of the Lanczos vectors V and W; this also 
implicitly determines the size of the matrix A (input) . 

M • the maximum number of Lanczos vectors that can be stored 

in the array VW. It is related to the size of the largest 
block that can be built . The algorithm runs out of memory 
when the number of vectors in the last block and in the 
current block reaches M. For this reason, M must be at 
least 3; note that this is not checked (input) . 

N *= the index of the last pair of vectors ( input /output ) . 

NK = the index of the last regular vectors (input /output) . 

NKM1 ” the index of the next-to-the-last regular vectors (input/ 

output) . 

VW - work array dimensioned (NDIM, 2*M+2) words. It is used to 
store the Lanczos vectors V in VW(:,1:M), the vectors W 
in VW ( : , M+l : 2 *M) , and two temporary vectors used by DLAL 
in VW(:,2*M+1) and VW(:,2*M+2). The Lanczos vectors V and 
W are stored wrapped, i.e., V_N is stored in VW(:,Q(N)), 
and W_N is stored in VW ( : ,M+Q (N) ) , where Q(N) is assumed 
to be a wrapped index array — see below (input /output) . 
SC * work array dimensioned (5,M) words, used for the various 
scale factors. We have: 

SC (1, i) - S (i) / S(i-l) 

SC (2, i) - T (i) / T(i-l) 

SC (3, i) - S (i) / T (i) 

SC (4 , i) - CSI(i) 

SC (5, i) - SIG(i) 

Note that the scale routine DSCALE expects to receive the 
scale factors in a 5x1 vector as the one described above 
(input /output) . 

WK - work array dimensioned (M,3*M+6) words, used for internal 
variables. We have: 

WK ( : , 1 :M) - (W_{NK} A T V_{NK) ) 

WK ( : , M+l : 2*M) - (WJNK} A T VJNK)) A {-1) 

WK ( : , 2*M+1 : 3*M) • work array for the SVD 
WK(:,3*M+1) - temporary vector 

WK(:,3*M+2) - temporary vector 

WK(:,3*M+3) - temporary vector 

WK(:,3*M+4) - the saved last column of the matrix 

(W_{NKM1 } A T V_{NKM1 } ) A {-1 } 

WK ( : , 3*M+5) - H(NK:NKP1,N) 

WK ( : , 3*M+6 ) - H (NK:NKP1,N+1) 

Of particular interest to the caller are H(NK:NKP1,N) and 
H (NK:NKP1,N+1) , since they are parts of the columns of H. 
Usually, the caller will extract these after each call 
and either store them so as to form H (for eigenvalues) 
or use them to update X_N (for linear systems) (input/ 
output) . 

q - integer array specifying the indices for all the wrapped 

variables (V,W, SC,WK) . To allow the algorithm to run more 
than M steps, these variable wrap around, in that Q(I) is 
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the index of the slots where that variables are stored at 
the I-th step. Normally, these indices would be in order, 
basically Q(I) - I MOD M + 1, but the algorithm makes no 
assumptions to this effect. These indices are not checked 
in any way for validity (input) . 

NORMS — vector with estimates for the norm of A. NORMS (1) is the 
current estimate. It should be at least slightly larger 
than 1.0, to avoid the possible effects of roundoff for 
the identity matrix. For this reason, the routine will 
change it to 2.0 if it is smaller than 2.0. NORMS (2) is 
what the estimate should be to allow closure of blocks 
caused by the norm check. A value of 0.0 indicates that 
no estimates are available, i.e., all inner vectors were 
built due to the moment matrix being singular. The second 
estimate is usually used in conjunction with restarting a 
block if the algorithm runs out of memory (input/output ) . 
TOL - vector with the tolerances used in the various checks . We 
have : 

TOL ( 1 ) « upper bound for the range of CSI and SIG 

TOL (2) - lower bound for the range of CSI and SIG 

TOL (3) - convergence tolerance for the norms of the 
Lanczos vectors 

TOL (4) - level below which the singular values of 
(W_{NK) A T V_{NK}) are considered zero 
Note that the scale routine DSCALE expects to receive the 
first three tolerances in a 3x1 vector as described. The 
values supplied by the caller are checked for validity 
and default values (see above) are supplied if the user 
provides a non-positive value for any of the tolerances 
(input /output) . 

TF - output unit for a trace file. If TF non-zero, the routine 
will output to unit TF trace messages detailing execution 
decisions . The output unit is assumed to be available and 
ready (input) . 

INFO - information passing variable. 

On input : 

INFO ■» 0 --> proceed normally 

INFO - 1 --> closing the block strongly recommended; 

if the block doesn't close naturally, do 
not update the counters, but update the 
norm estimate, if possible. 

Upon exit : 

INFO - 0 — > nothing to report 

INFO < 0 --> the SVD routine returned this error code 

(but with positive sign) 

INFO - 1 — > an A-invariant subspace has been found 

INFO - 2 —> an A A T-invariant subspace has been found 

INFO - 3 — > both subspaces have been found 

INFO - 4 --> the block did not close, though strongly 
recommended (INFO-1 on input) ; updated 
norm estimate, but did not compute any 
vectors and did not update the counters. 

(input /output ) . 

External routines used; 
subroutine axb(x,b) 

Computes b - A * x. 
subroutine atxb(x,b) 

Computes b - A A T * x. 
subroutine daxpy (n, da, dx, incx, dy, incy ) 

Computes dy - da * dx + dy. 
subroutine dcopy (n, dx, incx, dy, incy) 

Computes dy - dx. 

double precision ddot (n, dx, incx, dy, incy) 

Computes dy' * dx. 
double precision deta(i) 
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C Computes the second recursion scalar for the inner vectors. 

C subroutine dscal (n, da, dx, incx) 

C Computes dx • da * dx. 

C subroutine dscale (nact , v, w, sc, tol) 

C Computes the scaling factors for v and w. 

C subroutine dsvdc (x, ldx, n,p, s, e, u, ldu, v, ldv, work, job, info) 

C Computes the singular value decomposition of x. 

C subroutine dzero (n, dx, incx) 

C Zeros out dx. 

C double precision dzeta(i) 

C Computes the first recursion scalar for the inner vectors. 

C 

C Noel M. Nachtigal 

C August 28, 1990 

C 

Q********************************************************************** 

c 

INTRINSIC ABS, MAX, MIN 
EXTERNAL DDOT f DEPS, DETA, DZETA 
DOUBLE PRECISION DDOT, DEPS, DETA, DZETA 
C 

INTEGER INFO, M, N, NDIM, NK, NKM1, NLEN, Q(*), TF 
DOUBLE PRECISION NORMS (2), SC(5,M), TOL (4) 

DOUBLE PRECISION VW (NDIM, 2*M+2) , WK(M,3*M+6) 

Local variables . 

INTEGER HBASE, I, J, LCL, LINFO, NP1 

DOUBLE PRECISION ANORM, INVCSI, INVSIG, DTMP, DTMP1, DTMP2 
DOUBLE PRECISION NUNORM, TETA, TZETA, WNV, WNP1V 
LOGICAL INNER 

Check the tolerances and the norm estimate. 

IF (TOL (2 ) . LE . 0 . 0 ) TOL(2) - DEPS ()** (1 . 0/2 . 0) 

IF (TOL(l) .LE.0.0) TOL (1) - 1.0 / DEPS() 

IF (TOL (3) .LE.0.0) TOL(3) - DEPS()**(1. 0/4.0) 

IF (TOL(4) .LE.0.0) TOL(4) - DEPS()**(1. 0/3.0) 

NORMS ( 1 ) ■ MAX (NORMS (1) , 2 . 0D0 ) 

Compute local counters. 

NP1 - N + 1 
LCL - N - NK + 1 
HBASE - 1 - NKM1 
IF (NKM1.EQ.0) HBASE - 0 

Initialize the norm estimate. 

ANORM - NORMS (1) 

NUNORM - NORMS (2) 

Zero out the work portion of the column of H. 

CALL DZERO (LCL, WK (HBASE+NK, 3*M+5) , 1) 


Compute the matrix-vector products. 

CALL AXB (VW(1,Q(N) ) ,VW(1,2*M+1) ) 

CALL ATXB (VW (1,M+Q (N) ) , VW(1, 2*M+2) ) 

Subtract the part common to both types of vectors. This is also 
done to enhance the numerical properties. 

INVSIG “ 1.0 / SC (5 , Q (N) ) 

INVCSI - 1.0 / SC (4 , Q (N) ) 


IF (NKM1.GT.0) THEN 

DO 10 I - NKM1, NK-1 

DTMP2 - WK (HBASE+I, 3*M+5) 

DTMP1 » INVSIG * SC (5, Q (I) ) * DTMP 2 

CALL DAXPY (NLEN, -DTMP1 , VW (1, Q (I) ) , 1, VW (1 , 2*M+1) , 1) 

DTMP1 - INVCSI * SC (4 , Q (I) ) * DTMP2 * SC(3,Q(N)) 

$ / SC (3,Q (I) ) 

CALL DAXPY (NLEN, -DTMP1 , VW ( 1, M+Q ( I) ) , 1, VW (1 , 2*M+2 ) , 1) 

10 CONTINUE 

END IF 
C 

C Compute the inner product (W_N A T A \tilde{V}_N) . 

C 

WNV - DDOT (NLEN, VW (1,M+Q (N) ) , 1, VW (1, 2*M+1) , 1) 

C 

C Compute the SVD of (W_{NK} A T V_{NK}). We have to copy it first to 

C a temporary array, since the DSVDC routine destroys its argument. 

C From the DSVDC routine, we want both singular values and singular 
C vectors, hence JOB =11. 

C 

DO 20 J = 1, LCL 

CALL DCOPY (LCL, WK (1 , J) , 1 , WK (1 , M+J) , 1) 

20 CONTINUE 

CALL DSVDC (WK (1, M+l ) ,M, LCL, LCL, WK ( 1, 3*M+3) , WK (1, 3*M+2) , 

$ WK (1 , 2*M+1 ) , M, WK ( 1 , M+l ) , M, WK (1 , 3*M+1 ) , 11, LINFO) 

C 

C Check for error in DSVDC; abort on error. 

C 

LINFO LINFO 

IF (LINFO. NE.O) GO TO 120 
C 

C Check the smallest singular value to determine singularity of the 
C moment matrix (W_{NK} A T V_{NK}). 

C 

DTMP = WK (1, 3*M+3) 

DO 30 I = 2, LCL 

DTMP = MIN (DTMP, WK (I, 3*M+3) ) 

30 CONTINUE 

INNER = DTMP . LE . TOL ( 4 ) 

IF (INNER. AND. (TF. NE.O) ) THEN 

WRITE (TF, ' (A7 , 15, A21 ) ' ) 'Vector ',NP1,' is an inner vector; ' 
WRITE (TF, ' (A31, E10 . 3) ' ) ' ==> moment matrix is singular: ',DTMP 
END IF 
C 

IF (.NOT. INNER) THEN 
C 

C Matrix is not singular, compute its inverse. WK(:,M+I) has the 

C right singular vectors, WK(:,2*M+I) has the left singular vectors 

C and WK(:,3*M+3) contains the singular values. We save the inverse 

C in WK( : ,M+I ) . 

C 

DO 40 I = 1, LCL 

CALL DSCAL (LCL, 1 . 0/WK ( I, 3*M+3) , WK (1, M+I) , 1) 

40 CONTINUE 

DO 60 I = 1, LCL 
DO 50 J = 1, LCL 

WK ( J, 3*M+2 ) = DDOT (LCL, WK (I, M+l) ,M, WK ( J, 2*M+1) ,M) 

50 CONTINUE 

CALL DCOPY (LCL, WK (1, 3*M+2 ) , 1, WK ( I, M+l) ,M) 

60 CONTINUE 

C 

C Compute the (W_{NK} A T A V_N) term. 

C 

WK (LCL, 3*M+2 ) = SC(5,Q(N)) * SC(4,Q(N)) * WNV 
DO 70 I = 1, LCL-1 

DTMP = SC (2,Q(NK+I) ) * WK(I+1,LCL) + DZETA(I-l) * WK(I,LCL) 
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IF (I.GT.l) THEN 

DTMP - DTMP + DETA(I-l) * WK(I-1,LCL) / SC (2 , Q (NK+I-1) ) 
END IF 

WK(I, 3*M+2) - DTMP 
CONTINUE 

Compute H (NK: N, N) - (W_{NK} A T V_{NK)) A {-1) W_{NK} A T A V_N. 

Check whether we can build a regular vector with it . 

DTMP1 - 0.0 
DTMP 2 - 0.0 
DO 80 I - 1, LCL 

WK (I, 3*M+1) - DDOT (LCL, WK (I , M+l) , M, WK (1 , 3*M+2 ) , 1) 

DTMP - ABS (WK (I , 3*M+1 ) ) 

DTMP1 = DTMP1 + DTMP 

DTMP 2 - DTMP 2 + DTMP / SC (3, Q (NK+I-1) ) 

CONTINUE 

DTMP 2 - SC (3, Q (N) ) * DTMP 2 

INNER - (DTMP1 .GT.ANORM) .OR. (DTMP2 .GT . ANORM) 

IF (INNER) THEN 

One or both of the norm checks has failed, we have to build an 
inner vector. Output trace messages and update norm estimate. 

IF (TF.NE.0) THEN 

WRITE (6, ' (A7, 15, A21) ' ) 

$ 'Vector ' , NP1, ' is an inner vector;' 

IF ( DTMP 1. GT.ANORM) WRITE (6, ' (A29, E10 .3) ' ) 

$ '--> second term in V is bad: ' , DTMP1/ANORM 

IF (DTMP 2. GT.ANORM) WRITE (6, ' (A29,E10 .3) ' ) 

$ '— > second term in W is bad: ' ,DTMP2/ANORM 

END IF 

DTMP1 - MAX (DTMP1 , DTMP2 ) 

IF (NUNORM . EQ . 0 . 0 ) THEN 
NUNORM - DTMP1 
ELSE 

NUNORM - MIN (NUNORM, DTMP 1) 

END IF 
END IF 
END IF 

IF (.NOT. INNER) THEN 

We can build a regular vector, let's do it. Copy the temporary 
vectors to the proper slots in V and W. 

CALL DCOPY (NLEN, VW (1, 2*M+1) , 1 , VW(1, Q (NP1) ) , 1) 

CALL DCOPY (NLEN, VW (1, 2*M+2) , 1 , VW(1,M+Q (NP1) ) , 1) 

Add in the term V_{NK) H (NK :N, N) . 

DO 90 I - NK, N 

DTMP 2 - WK ( I-NK+1 , 3*M+1) 

DTMP1 - INVSIG * SC (5, Q (I) ) * DTMP2 

CALL DAXPY (NLEN, -DTMP1 , VW (1, Q (I) ) , 1, VW (1, Q (NP1) ) , 1) 

DTMP1 - INVCSI * SC(4,Q(D) * DTMP2 * SC(3,Q(N)) 

$ / SC (3, Q (I) ) 

CALL DAXPY (NLEN, -DTMP1 , VW ( 1, M+Q (I) ) , 1, VW (1 ,M+Q (NP1 ) ) , 1 ) 
CONTINUE 

Scale the new vectors . 

SC (3, Q (NP1) ) - SC (3, Q (N) ) 

SC (4 , Q (NP1) ) - SC ( 4 , Q (N) ) 

SC (5, Q (NP1) ) - SC ( 5, Q (N) ) 

CALL D SCALE (NLEN, VW ( 1 , Q (NP1 ) ) , VW (1, M+Q (NP1) ) , SC (1 , Q (NP1 ) ) , TOL) 


WK (HBASE+NP1 , 3 *M+5 ) - SC (1 , Q (NP1 ) ) 

C 

C Compute the inner product (W_{N+1} A T V_{N+1}). 

C 

WNP1V - DDOT (NLEN, VW ( 1 , M+Q (NP1 ) ) , 1 , VW ( 1, Q (NP1) ) , 1 ) 

C 

C Compute the term W_{NK} A T A V_{N+1} and its 1-norm. 

C 

CALL DCOPY (LCL,WK(1,M+LCL),1,WK(1,3*M+2),1) 

DTMP = SC (5, Q (NP1) ) * SC(4,Q(NP1)) * SC(2,Q(NP1)) * WNP1V 
CALL DSCAL (LCL, DTMP , WK (1, 3*M+2 ) , 1 ) 

C 

C Check whether we can build either vector at the next step. 

C 

DTMP1 - 0.0 
DTMP 2 =0.0 
DO 100 1=1, LCL 

DTMP = ABS (WK (I , 3*M+2 ) ) 

DTMP1 = DTMP1 + DTMP 

DTMP 2 = DTMP 2 + DTMP / SC (3, Q (NK+I-1) ) 

100 CONTINUE 

DTMP 2 - SC(3,Q(N)) * DTMP 2 

INNER = (DTMP1 .GT.ANORM) .OR. (DTMP2 .GT.ANORM) 

C 

C The next vector would be bad, we have to build an inner vector. 

C Output trace messages and update the norm estimate. 

C 

IF (INNER) THEN 

IF (TF.NE.0) THEN 

WRITE (6, ' (A7, I5,A21) ' ) 

$ 'Vector ',NP1,' is an inner vector;' 

IF ( DTMP 1. GT.ANORM) WRITE (6, ' (A30,E10 .3) ' ) 

$ '==> next vector V will be bad: ', DTMP 1/ANORM 

IF (DTMP 2. GT.ANORM) WRITE (6, ' (A30,E10 .3) ' ) 

$ '==> next vector W will be bad : ' , DTMP 2 / ANORM 

END IF 

DTMP1 - MAX (DTMP1, DTMP2 ) 

IF (NUNORM . EQ . 0 . 0 ) THEN 
NUNORM = DTMP1 
ELSE 

NUNORM - MIN (NUNORM, DTMP 1) 

END IF 
END IF 
END IF 
C 


IF (INNER) THEN 
C 

C Check whether we were supposed to close the block. If so, return 

C without computing anything else . 

C 

IF (INFO.NE.O) THEN 
LINFO = 4 
GO TO 120 
END IF 


C 

C We are building an inner vector. Copy the temporary vectors to 

C the proper slots in V and W. 

C 

CALL DCOPY (NLEN, VW (1, 2*M+1) , 1 , VW (1, Q (NP1) ) , 1) 

CALL DCOPY (NLEN, VW(1, 2*M+2) , 1, VW (1,M+Q(NP1) ) , 1) 

C 

C Add the terms from the inner vector recursion. 

C 

TZETA = DZETA(N-NK) 

WK (HBASE+N, 3*M+5 ) = TZETA 

CALL DAXPY (NLEN, -TZETA, VW ( 1 , Q (N) ), 1 , VW ( 1 , Q (NP1) ), 1 ) 
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CALL DAXPY (NLEN, -TZETA, VW (1 ,M+Q (N) ) , 1, VW (1, M+Q (NP1 ) ) , 1) 

IF (LCL.GT.l) THEN 
TETA - DETA(N-NK) 

DTMP2 - TETA * INVSIG * SC ( 5, Q (N-l)) / SC(1,Q(N)) 

CALL DAXPY (NLEN, -DTMP2 , VW (1, Q (N-l) ) , 1, VW (1 , Q (NP1 ) ) , 1 ) 

DTMP2 - TETA * INVCSI * SC (4, Q (N-l)) / SC(2,Q(N)) 

CALL DAXPY (NLEN, -DTMP2 , VW (1, M+Q (N-l) ), 1, VW (1 , M+Q (NP1 )), 1 ) 
WK ( HBASE+N- 1 , 3 *M+ 5 ) - TETA / SC(1,Q(N)) 

END IF 

Scale the new vectors. 

SC (3,Q (NP1) ) - SC (3, Q (N) ) 

SC (5,Q (NP1) ) - SC (5, Q (N) ) 

SC (4 , Q (NP1 ) ) - SC (4 , Q (N) ) 

CALL DSCALE (NLEN, VW (1 , Q (NP1 ) ) , VW (1,M+Q (NP1) ) , SC (1 , Q (NP1 ) ) , TOL) 
WK (HBASE+NP1 , 3*M+5 ) - SC(1,Q(NP1)) 

Compute the inner product (W_{N+1} A T V_{N+1}) . 

WNP1V - DDOT (NLEN, VW (1 ,M+Q (NP1 ) ) , 1, VW(1, Q (NP1) ) , 1) 

Update the matrix (W_{NK) A T V_{NK}). 

WK (LCL+1 , LCL+1 ) - SC (5,Q(NP1) ) * SC(4,Q(NP1)) * WNP1V 
DTMP - SC ( 5 , Q (N) ) * SC(4,Q(N)) * WNV - TZETA * WK(LCL,LCL) 

IF (LCL.GT.l) THEN 

DTMP - DTMP - TZETA * WK (LCL, LCL-1) / SC(1,Q(N)) 

END IF 

WK (LCL, LCL+1) - DTMP / SC(1,Q(NP1)) 

WK (LCL+1, LCL) - DTMP / SC(2,Q(NP1)) 

DTMP 2 - SC (3, Q (NP1) ) / SC(3,Q(N)) 

DO 110 I - LCL-1, 1, -1 

DTMP1 - (DZETA(I) - TZETA) * WK(I,LCL) 

$ - TETA * WK (I, LCL-1) / SC(1,Q(N)) 

$ + SC (2,Q(NK+I) ) * WK (1+1, LCL) 

IF (I.GT.l) THEN 

DTMP1 - DTMP1 + DETA(I) * WK(I-1,LCL) / SC (2 , Q (NK+I-1) ) 
END IF 

DTMP1 - DTMP1 / SC (1, Q (NP1) ) 

DTMP 2 - DTMP 2 * SC (3, Q (NK+I) ) / SC (3, Q (NK+I-1) ) 

WK( LCL+1, I) - DTMP1 * DTMP 2 
WK (I, LCL+1) - DTMP1 
110 CONTINUE 

Initialize H (NKM1 : NK-1 , N+l ) for the next step. It is the last 
column of (W_{NKM1} A T V_{NKM1}) A {-1>, scaled appropriately. 

IF (NKM1.GT.0) THEN 

DTMP - SC (2 , Q (NK) ) * WK(1, LCL+1) 

CALL DCOPY (NK-NKM1, WK(1, 3*M+4) , 1, WK(1, 3*M+6) , 1) 

CALL DSCAL (NK-NKM1 , DTMP, WK (1, 3*M+6) , 1) 

END IF 

ELSE 

We have built a regular vector. Output trace message. 

IF (TF.NE.O) WRITE ( 6, ' ( A7 , 15, A21) ' ) 

$ 'Vector ' ,NP1, ' is a regular vector.' 

Save H (NK :NKP1-1, N) . 

CALL DCOPY (LCL, WK ( 1 , 3*M+1 ) , 1, WK (HBASE+NK, 3*M+5) , 1 ) 

Save the last column of (W_{NK} A T V_{NK) )*{-!} for next step. 
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c 

CALL DCOPY (LCL, WK (1,M+LCL) , 1, WK (1, 3*M+4) , 1) 

Initialize H (NK:NKP1-1, N+l) for next step. 

CALL DCOPY (LCL, WK (1 , 3*M+2 ) , 1, WK (1 , 3*M+6) , 1) 

WK (1,1) - SC (5, Q (NP1 ) ) * SC (4 , Q (NP1) ) * WNP1V 

Update the counters. 

NKM1 - NK 
NK -N+l 
END IF 

Update the running counter. 

N - N + 1 

Check for termination. 

LINFO - 0 

IF (SC(4,Q(NP1) ) .EQ.-1.0) LINFO = LINFO + 1 
IF (SC ( 5 , Q (NP1) ) . EQ . -1 . 0) LINFO - LINFO + 2 

Save the updated norm estimate and set the INFO variable. 

.20 NORMS (2) - NUNORM 
INFO - LINFO 

RETURN 
END 

********************************************************************* 
DOUBLE PRECISION FUNCTION DADD (X) 

Purpose : 

Computes X + i . 0 . Used by the DEPS function to ensure that the 
optimizer doesn't affect the results. 

Parameters : 

X - the variable to add 1.0 to (input) . 

Noel M. Nachtigal 
November 18, 1987 

DOUBLE PRECISION X 

DADD - X + 1.0 

RETURN 
END 

★ ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★■A- ★★★★★★★★★★★★★★★★* ★★★★★★★★★★★★★★★'A’* 

DOUBLE PRECISION FUNCTION DEPS () 

C 

C Purpose : 

C Computes double precision machine epsilon, the smallest number 

C which, when added to 1.0, gives 1.0. This function could use the 

C radix of the machine to obtain a better estimate, but its result 

C is good enough for our purposes. It does attempt to ensure that 

C any optimization does not affect the result. 

C 

C External routines used: 

C double precision dadd(dx) 
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Computes dx = dx + 1 . Used to get around optimizers. 

Noel M. Nachtigal 
October 25, 1990 

EXTERNAL DADD 
DOUBLE PRECISION DADD 

Local variables. 

DOUBLE PRECISION DTMP 

DTMP - 1.0 

0 IF (DADD (DTMP) .GT.1.0) THEN 
DTMP - DTMP / 2.0 
GO TO 30 
END IF 

DEPS - DTMP 

RETURN 
END 

********************************************************************* 

SUBROUTINE DSCALE (N, V, W, SC, TOL) 

C 

C Purpose: 

C Does scaling in the Lanczos algorithm. Scales the vectors V and W 

C to unit length. Also checks for invariant subspaces. Note that it 

C does not check its inputs for validity. 

C 

C Parameters: 

C N - the length of the vectors (input) . 

C V - the Lanczos vector V, scaled on output (input/output ) . 

C W - the Lanczos vector W, scaled on output (input/output ) . 

C SC(1) - S_{N+1 } / S_N (output). 

C SC (2) - T_{N+1 } / T_N (output). 

C SC (3) - on input, S_N / T_N; on exit, S_{N+1) / T_{N+1} (input/ 

C output) . 

C SC (4) - on input, CSI_N; on exit, CSI_{N+1). If it is -1.0, then 

C the norm of W was less than TOL (3) on input, indicating 

C an invariant subspace for A A T (input /output) . 

C SC (5) - on input, SIGMA_N; on exit, SIGMA_(N+1}. If it is -1.0, 

C then the norm of V was less than TOL (3) on input, 

C indicating an invariant subspace for A (input/output ) . 

C TOL(l) - the level above which vectors will be scaled (input) . 

C TOL (2 ) - the level below which vectors will be scaled (input). 

C TOL (3) - the tolerance level below which the norms of the vectors 

C are treated as zero (input) . 

C 

C External routines used: 

C double precision dnrm2 (n, dx, incx) 

C Returns the 2-norm of dx. 

C subroutine dscal (n, da, dx, incx) 

C Computes dx - da * dx. 

C 

C Noel M. Nachtigal 

C August 28, 1990 

C 

EXTERNAL DNRM2 
DOUBLE PRECISION DNRM2 
C 

INTEGER N 

DOUBLE PRECISION SC(*), TOL (3), V(*), W(*) 



c 

C Local variables . 

C 

DOUBLE PRECISION TMPS, TMPT 
C 

C Initialize the scale factors. 

C 

SC(1) - DNRM2 (N, V, 1) 

SC (2 ) - DNRM2 (N,W, 1) 

IF (SC(1) *SC(5) ,LE.T0L(3) ) SC(5) =-1.0 
IF (SC (2 ) *SC ( 4 ) . LE . TOL ( 3) ) SC(4) =-1.0 
IF ( ( SC ( 4 ) .LT.0.0) . OR . ( SC ( 5 ) .LT.0.0)) RETURN 
C 

C Compute the scale factors and scale the vectors. 

C 

TMPS = 1.0 / SC(1) 

TMPT = 1.0 / SC (2) 

C 

IF ( (TMPS. LE.TOL(2) ) .OR. (TMPS. GE.TOL(l) ) ) THEN 
CALL DSCAL (N,TMPS,V, 1) 

TMPS =1.0 
END IF 

IF ( (TMPT . LE . TOL (2) ) . OR . (TMPT . GE . TOL ( 1) ) ) THEN 
CALL DSCAL (N,TMPT,W,1) 

TMPT =1.0 
END IF 
C 

SC(1) = SC (5) * SC(1) 

SC (2) = SC ( 4 ) * SC (2) 

SC (3) = SC(1) * SC ( 3) / SC (2 ) 

SC (4) = TMPT 
SC (5) = TMPS 
C 

RETURN 

END 

C 

Q************ **************************************** ****************** 

c 

SUBROUTINE DZERO (N,DX, INCX) 

C 

C Purpose : 

C Zeroes out DX. 

C 

C Parameters: 

C N = the length of the vector DX (input) . 

C DX = the vector to be zeroed out (output) . 

C INCX = the increment in the vector DX (input) . 

C 

C Noel M. Nachtigal 

C October 26, 1990 

C 

INTRINSIC MOD 
C 

INTEGER INCX, N 
DOUBLE PRECISION DX(*) 

C 

C Local variables . 

C 

INTEGER I, IX, M, MP1 
C 

IF (N . LE . 0 ) RETURN 
IF (INCX.NE.l) GO TO 40 
C 

C Code for increment equal to 1 . 

C 

M = MOD (N, 8) 
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10 

20 


30 


40 


50 


IF (M.EQ.0) GO TO 20 


DO 10 I - 1, 

M 

DX ( I ) - 0. 

0 

CONTINUE 

IF (N . LT . 8 ) RETURN 

MP1 - M + 1 

DO 30 I - MP1 

N, 

DX (I ) 

0.0 

DX (1+1 ) - 

0.0 

DX (1+2 ) = 

0.0 

DX (1+3) - 

0.0 

DX (1+4 ) - 

0.0 

DX (1+5) - 

0.0 

DX (1+6) - 

0.0 

DX (1+7 ) •= 

0.0 


CONTINUE 

RETURN 

Code for increment not equal to 1 . 

IX - 1 

IF (INCX.LT . 0) IX «= (-N+1) * INCX + 1 
DO 50 I - 1, N 
DX (IX) - 0.0 
IX - IX + INCX 
CONTINUE 


RETURN 

END 

**★*****★★★★★★*★*★★★★★★★★★***★**★*********★*****★****★*■*★************* 


c 

C Copyright (C) 1990, Roland W. Freund and Noel M. Nachtigal 

C All rights reserved. 

C 

C No part of this code may be reproduced, stored in a retrieval 

C system, translated, transcribed, transmitted or distributed in 

C any form or by any means means, manual, electric, electronic, 

C electo-magnetic, mechanical, chemical, optical, photocopying, 

C recording, or otherwise, without the prior explicit written 

C permission of the author (s) or their designated proxies. In no 

C event shall the above copyright notice be removed or altered in 

C any way . 

C 

C This code is provided "as is", without any warranty of any kind, 

C either expressed or implied, including but not limited to, any 

C implied warranty of merchantibility or fitness for any purpose. 

C In no event will any party who distributed the code be liable for 

C damages or for any claim (s) by any other party, including but not 

C limited to, any lost profits, lost monies, lost data or data 

C rendered inaccurate, losses sustained by third parties, or any 

C other special, incidental or consequential damages arising out of 

C the use or inability to use the program, even if the possibility 

C of such damages has been advised against. The entire risk as to 

C the quality, the performance, and the fitness of the program for 

C any particular purpose lies with the party using the code. 

C 

C ******************************* **************************** ****** 

C ANY USE OF THIS CODE CONSTITUES ACCEPTANCE OF THE TERMS OF THE 
C ABOVE STATEMENTS 

C ***************************************************************** 

c 

c****** **************************************************************** 

c 

C This file contains the routines for an eigenvalue solver which 

C uses the Lanczos code. EIGLAL is the basic routine used to obtain 

C the Hessenberg matrix whose eigenvalues are taken as estimates 

C the eigenvalues of A. 

C 


C 

SUBROUTINE EIGLAL (NDIM, NLEN, HDIM, NLIM, M, VW, SC, WK, Q, TOL, ANORM, 

$ INFO, H) 

C 

C Purpose : 

C This subroutine uses the Lanczos algorithm to set up a Hessenberg 

C matrix that can be used to compute eigenvalue estimates for A. It 

C runs the Lanczos algorithm for NLIM steps, storing the columns of 

C H returned by DLAL in H. The caller initializes the following: 

C VW ( : , Q (1) - the first Lanczos vector V_1 

C VW(:,M+Q(1)) - the first Lanczos vector W_1 

C Q - the array of wrapped indices 

C TOL (1:4) - tolerances for the Lanczos algorithm (optional) 

C ANORM - estimate for the norm of the matrix (optional) 

C INFO - the output units, if any 

C If the user provides non-positive values for the tolerances in 

C TOL (1:4), the Lanczos routine DLAL will supply its own defaults. 

C Also, if the user provides a norm estimate of less than 2.0, the 

C Lanczos routine DLAL will set the norm estimate to 2.0. 

C 

C Parameters: 

C NDIM - the dimensioned size of the array VW. Must be at least 1; 


C 

C 

C 

C 


checked for validity (input) . 

the actual size of the Lanczos vectors V and W; this also 
implicitly determines the size of the matrix A. Must be 
less than or equal to NDIM; checked for validity (input) . 


NLEN 



HDIM 


checked 


C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 
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c 
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c 

c 
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c 

c 

c 

c 

c 

c 

c 
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NLIM 


M 


vw 


sc 


WK 


Q 


TOL 


= the dimensioned size of H. Must be at least 1; 

for validity (input) . 

•- the maximum number of steps the algorithm can take before 
H overflows. Must be less than or equal to HDIM; checked 
for validity. On exit, it is the size of the usable part 
of H, i.e., the eigenvalues of H (1 : NLIM, 1 : NLIM) can be 
used as estimates for NLIM of the eigenvalues of A 
(input /output) . 

- the maximum number of Lanczos vectors that can be stored 
in the array VW. It is related to the size of the largest 
block that can be built . The algorithm runs out of memory 
when the number of vectors in the last block and in the 
current block reaches M. For this reason, M must be at 
least 3; checked for validity (input) . 

= work array dimensioned (NDIM,2*M+2) words. It is used to 
store the Lanczos vectors V in VW(:,1:M), the vectors W 
in VW ( : , M+l ; 2 *M) , and two temporary vectors used by DLAL 
in VW(:,2*M+1) and VW(:,2*M+2). The Lanczos vectors V and 
W are stored wrapped, i.e., V_N is stored in VW(:,Q(N)) 
and W_N is stored in VW ( : ,M+Q (N) ) , where Q(N) is assumed 
to be a wrapped index array -- see below (input /output) . 

* work array dimensioned (5,M), used to store the various 
scale factors. We have: 


SC (1, i) - S (i) / S(i-l) 

SC (2, i) - T (i) / T(i-l) 

SC (3, i) - S (i) / T (i ) 

SC (4 , i) - CSI(i) 

SC (5, i) - SIG(i) 

Note that the scale routine DSCALE expects to receive the 
scale factors in a 5x1 vector as the one described above. 
This routine initializes the first column; thereafter, 
the DLAL routine will update the array (input /output ) . 

- work array dimensioned (M,5*M+14), used to store internal 
variables. We have: 

WK ( : , 1 :M) - (W_{NK} A T V_(NK) ) 

WK ( : ,M+1) to WK ( : , 2*M) 


WK ( : , 2*M+1) 

WK ( : , 3*M+1) 
WK ( : , 3*M+2 ) 
WK ( : , 3*M+3) 
WK ( : , 3*M+4 ) 


- (W_{NK} A T V_{NK}) 'M-l) 
to WK ( : , 3*M) 

« work array for the SVD routine DSVDC 

- temporary vector 

- temporary vector 

- temporary vector 

- the saved last column of the matrix 


WK ( : , 3*M+5) 
WK ( : , 3*M+6 ) 
WK(:,3*M+7) 

WK ( : , 3*M+8 ) 


(W_{NKM1 } A T V_{NKM1 } ) * { -1 } 

- H(NK:NKP1,N) 

- H (NK:NKP1,N+1) 

- the saved part of H (N) , used in case 
the block is restarted 

- the saved part of H(NP1), used in 
case the block is restarted 


(input /output) . 

— integer array specifying the indices for all the wrapped 
variables (V,W, SC,WK). To allow the algorithm to run more 
than M steps, these variable wrap around, in that Q(I) is 
the index of the slots where that variables are stored at 
the I-th step. Normally, these indices would be in order, 
basically Q(I) - I MOD M+l, but the algorithm makes no 
assumptions to this effect. These indices are not checked 
in any way for validity (input) . 

- vector with the tolerances used in the various checks. We 


have : 

T0L(1) - upper bound for the range of CSI and SIGMA 
TOL (2) = lower bound for the range of CSI and SIGMA 
TOL (3) = level below which the norms of the Lanczos 
vectors are considered zero (used to check if 
an invariant subspace was found) 
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c 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 
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T0L(4) — level below which the singular values of 
(W_{NK} / 'T V_ { NK } ) are considered zero 
Note that the scale routine DSCALE expects to receive the 
tolerances in a 4x1 vector as the one described above. If 
the user provides non-positive values for any of T0L(1:4) 
then the DLAL routine will replace them with defaults, as 
follows : 


T0L(1) - 1/TOL (2 ) 

TOL (2) - eps A {l/2} 

TOL (3) - eps A { 1/4 } 

TOL ( 4 ) - eps~{l/3} 

Here, eps is machine epsilon (input /output) . 

ANORM = user-supplied estimate for the norm of A. If a value less 
than 2.0 is provided, it is replaced with 2.0. On exit, 
it is set to the last value used by the algorithm, which 
updates the norm estimate whenever it is necessary to 
close a block ( input /output ) . 

INFO - information passing variable. 

Upon entry, it gives the numbers of the output units used 
to trace execution. There are two such units available, 
one where the non-zero elements of the Hessenberg matrix 
H are sent, and the other where various trace messages 
about the progress of the algorithm are sent. The data in 
H is output in groups of numbers, one number per line, as 
follows : each group begins with two integers that specify 
the starting and ending row indices, followed by the 
actual values, reals output in format E25.18. Note that 
the ending row index is one higher than the column index, 
as H is upper Hessenberg. To extract the unit numbers for 
the two units, INFO - xxyy, where xx is the unit number 
for the data for H, and yy is the unit number for the 
trace messages. For example, INFO - 1106 means that the 
data for H will be sent to unit 11 and the trace messages 
will be sent to unit 6. A unit number of 00 denotes that 
no output is to be sent to that unit . INFO - 0 disables 
both outputs. It is the responsibility of the caller to 
ensure that the units are ready for output. 

Upon exit : 


H 


INFO « 0 ■■> nothing to report, algorithm converged 

INFO < 0 «> the SVD routine returned this error code 

in DLAL (with positive sign) 

INFO - 1 “> an A-invariant subspace has been found 

INFO - 2 — > an A A T-invariant subspace has been found 

INFO - 3 -**> both subspaces have been found 
INFO - 4 —> the last block could not be closed 
INFO - 8 — > algorithm failed to converge after NLIM 

steps 

INFO - 16 -■> invalid inputs 

see the description in the routine DLAL 


For more details, 
(input /output) . 
array dimensioned 


(NLIM, NLIM) used for H (output) 


External routines used: 
subroutine dcopy (n, dx, incx, dy, incy) 

Computes dy - dx. 

double precision ddot (n, dx, incx, dy, incy) 

Computes the dot product of dx and dy. 
subroutine dial (ndim, nlen,m, n, nk, nkml , vw, sc, wk, q, norms, tol, info) 
Does one step of the look-ahead Lanczos algorithm, 
subroutine dscal (n, da, dx, incx) . 

Computes dx - da * dx. 
subroutine dscale (n, v, w, sc, tol) 

Scales the Lanczos vectors v and w. 
subroutine dzero (n, dx, incx) 

Zeroes out dx. 
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Noel M. Nachtigal 
October 25, 1990 

it********************************************************************* 


INTRINSIC MAX 

EXTERNAL DDOT 
DOUBLE PRECISION DDOT 

INTEGER HDIM, INFO, M, NLEN, NLIM, NDIM, Q(NLIM) 

DOUBLE PRECISION ANORM, H (HDIM, HDIM) , SC(5,M), T0L(5) 

DOUBLE PRECISION VW (NDIM, 2*M+2) , WK(M,3*M+8) 

Local variables . 

INTEGER I, N, NK, NKM1, ONKM1 
INTEGER HF, TF 

DOUBLE PRECISION DTMP, NORMS (2) 

Check whether the inputs are valid. 

IF ( (NDIM.LT. 1) .OR. (NLEN. GT .NDIM) .OR. (HDIM. LT . 1) .OR. (NLIM . GT . HDIM) 
$ .OR. (M.LT.3) ) THEN 

INFO - 16 
RETURN 
END IF 

Extract the output units HF and TF from INFO. 

HF - INFO / 100 
TF - INFO - HF * 100 

Initialize the counters. 

N - 1 

NK - 1 

NKM1 - 0 

Scale the first pair of Lanczos vectors. 

SC (4, Q (1) ) - 1.0 
SC (5, Q (1) ) - 1.0 
SC (3,Q (1) ) - 1.0 

CALL DSCALE (NLEN, VW (1, Q (1) ) , VW (1,M+Q (1) ) , SC (1,Q (1) ) , TOL) 

Check for convergence (already?) . 

INFO - 0 

IF (SC (4, Q (1) ) .EQ.-1.0) INFO - INFO + 1 
IF (SC(5,Q(1) ) .EQ.-1.0) INFO - INFO + 2 
IF (INFO.NE.O) RETURN 

Set up WK (1, 1) . 

WK (1, 1) - DDOT (NLEN, VW ( 1, Q (1) ) , 1, VW (1,M+Q (1) ) , 1 ) 

WK (1,1) “ SC(5,Q(1)) * SC (4 , Q (1 ) ) * WK(1,1) 

Initialize the norm estimate. It has to be at least 2.0. 

DTMP “ 2.0 

NORMS (2) - 0.0 
NORMS (1) - MAX (ANORM, DTMP) 


Iterate . 



10 ONKM1 - MAX (1,NKM1) 

C 

C If we have closed a block, save the working variables, in case we 

C need to restart. Also, reset the norm estimator. 

C 

IF (N.EQ.NK) THEN 

CALL DCOPY (N-NKM1+1 , WK ( 1 , 3*M+6 ) , 1 , WK (1, 3*M+8) , 1 ) 

CALL DCOPY (N-0NKM1+1, WK (1 , 3*M+5 ) , 1 , WK (1, 3*M+7 ) , 1) 

NORMS (2) =0.0 
END IF 
C 

C Check whether we have enough room left in the arrays . 

C 

20 INFO = 0 

IF (N-NKM1+2.GE.M) INFO = 1 

IF ( (INFO.NE.O) .AND. (TF.NE.O) ) THEN 

WRITE (TF, '(A39)') 'Block is maximal, recommending closure.' 
END IF 
C 

C Set ANORM to the current value of the norm estimate . 

C 

ANORM - NORMS ( 1 ) 

C 

C Do one step of the Lanczos algorithm. 

C 

CALL DLAL (NDIM, NLEN, M, N, NK, NKM1 , VW, SC, WK, Q, NORMS, TOL, TF, INFO) 

C 

C Check the info passing variable. 

C We check whether the DSVDC routine reported errors or whether the 
C block did not close when it was maximal, both of which result in 

C an immediate return, and whether an invariant subspace was found, 

C which just stops the iteration. 

C 

IF (INFO.LT.O) THEN 
NLIM - N 
RETURN 

ELSE IF (INFO.EQ.l) THEN 
NLIM = N 

ELSE IF (INFO.EQ.2) THEN 
NLIM = N 

ELSE IF (INFO.EQ.l + 2) THEN 
NLIM = N 

ELSE IF (INFO.EQ. 4) THEN 
N = NK 

IF (TF.NE.O) WRITE (TF,'(A20)') 'Block did not close:' 

C 

C Block did not close, do we have another norm estimate? 

C 

IF (NORMS ( 2 ) . EQ .0.0) THEN 

IF (TF.NE.O) WRITE (TF,'(A47)') 

$ '==> no new norm estimates available (aborting) . ' 

NLIM = N 
RETURN 
ELSE 
C 

C Update the norm the block is guaranteed to close now. We then 

C restart the block. 

C 

NORMS (2) = 2.0 * NORMS (2) 

IF (TF.NE.O) WRITE (TF, ' (A30, E25.18)') 

$ '==> updating norm estimate to ', NORMS (2) 

CALL DCOPY (N-NKM1+1, WK (1 , 3*M+8 ) , 1, WK (1, 3*M+6) , 1) 

CALL DCOPY (N-ONKM1+1 , WK (1, 3*M+7) , 1, WK (1, 3*M+5) , 1 ) 

NORMS (1) = NORMS (2) 

NORMS (2) =0.0 
GO TO 20 
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END IF 
END IF 

Output H to the trace file. 

IF (HF.NE.O) THEN 

WRITE (11,' (120)') ONKM1 
WRITE (11, ' (120) ' ) N 

WRITE (11, ' (E25.18) ' ) (WK ( I-ONKM1+1, 3*M+5) , I-0NKM1,N) 

END IF 

Initialize the "next" column of H. First, zero it all out, then, 
copy the part we have so far. 


CALL DZERO (NLIM, H (1, N-l) , 1) 

CALL DCOPY (N-ONKM1+1, WK(1, 3*M+5) , 1,H (0NKM1,N-1) , 1) 


Set up the work vector for the next step. 

CALL DCOPY (N-NKM1+1, WK (1 , 3*M+6 ) , 1, WK (1 , 3*M+5) , 1) 


Iterate up to NLIM steps. 


IF (N.LT.NLIM) GO TO 10 

Adjust the dimension NLIM, which right now is one bigger than the 
size of the biggest usable submatrix of H. 


NLIM - NLIM - 1 


RETURN 

END 


********************************************************************** 


c********************************************************************** 

c 

C Copyright (C) 1990, Roland W. Freund and Noel M. Nachtigal 

C All rights reserved. 

C 

C No part of this code may be reproduced, stored in a retrieval 

C system, translated, transcribed, transmitted or distributed in 

C any form or by any means means, manual, electric, electronic, 

C electo-magnetic, mechanical, chemical, optical, photocopying, 

C recording, or otherwise, without the prior explicit written 

C permission of the author (s) or their designated proxies. In no 

C event shall the above copyright notice be removed or altered in 

C any way . 

C 

C This code is provided "as is", without any warranty of any kind, 
C either expressed or implied, including but not limited to, any 

C implied warranty of merchantibility or fitness for any purpose. 

C In no event will any party who distributed the code be liable for 

C damages or for any claim (s) by any other party, including but not 

C limited to, any lost profits, lost monies, lost data or data 

C rendered inaccurate, losses sustained by third parties, or any 

C other special, incidental or consequential damages arising out of 

C the use or inability to use the program, even if the possibility 

C of such damages has been advised against. The entire risk as to 

C the quality, the performance, and the fitness of the program for 
C any particular purpose lies with the party using the code . 

C 

C ***************************************************************** 

C ANY USE OF THIS CODE CONSTITUES ACCEPTANCE OF THE TERMS OF THE 
C ABOVE STATEMENTS 

C ***************************************************************** 

c 

0 * ******* ******** ****************** ****** ********************** ******** 

c 

C This file contains the routines for the QMR algorithm. SYSLAL is 

C the basic routine used to solve linear systems with QMR. BCKSUB 

C is a triangular matrix solver geared towards the setup used by 

C the Lanczos code (in particular, vector wrapping) . 

C 

C********************************************************************** 

c 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


SUBROUTINE SYSLAL (NDIM, NLEN, NLIM,M, VW, SC, WK, Q, TOL, ANORM, INFO) 
Purpose : 

This subroutine uses the Lanczos algorithm, combined with the QMR 
algorithm, to solve linear systems . It runs the QMR algorithm to 
convergence or until a user-specified iteration limit is reached. 
The caller initializes the following: 

VW(:,Q(1) - the first Lanczos vector V_l / the residual for 

the initial guess 
the first Lanczos vector W_1 
the array of wrapped indices 

tolerances for the Lanczos algorithm (optional) 
convergence tolerance for the residual norm 
estimate for the norm of the matrix (optional) 
the output units, if any 
If the user provides non-positive values for the tolerances in 
TOL (1:4), the Lanczos routine DLAL will supply its own defaults. 
Also, if the user provides a norm estimate of less than 2.0, the 
Lanczos routine DLAL will set the norm estimate to 2.0. 

Parameters : 

NDIM -= the dimensioned size of the array VW. Must be at least 1; 
checked for validity (input) . 

NLEN - the actual size of the Lanczos vectors V and W; this also 
implicitly determines the size of the matrix A. Must be 


VW ( : , M+Q ( 1 ) ) 

Q 

TOL (1:4) 

TOL (5) 

ANORM 

INFO 


NLEN 



NLIM 
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vw 


sc 


WK 


less than or equal to NDIM; checked for validity (input) . 

- the maximum number of iterations the algorithm can take. 
Must be at least 1; checked for validity. On exit, it is 
the index of the last step taken or attempted, depending 
on INFO, see below (input /output) . 

- the maximum number of Lanczos vectors that can be stored 
in the array VW. It is related to the size of the largest 
block that can be built . The algorithm runs out of memory 
when the number of vectors in the last block and in the 
current block reaches M. For this reason, M must be at 
least 3; checked for validity (input) . 

- work array dimensioned (NDIM, 3*M+4) words. It is used to 
store the Lanczos vectors V in VW(:,1:M), the vectors W 
in VW ( : , M+l : 2*M) , two temporary vectors used by DLAL in 
VW ( : , 2*M+1) and VW(:,2*M+2), the right hand side vector B 
in VW(:,2*M+3), the current solution X_N in WV(:,2*M+4), 
and the update direction vectors PR in VW(: ,2*M+5:3*M+4) . 
The Lanczos vectors V and W and the direction vectors PR 
are stored wrapped, i.e., V_N is stored in VW(:,Q(N)) and 
W_N is stored in VW ( : ,M+Q (N) ) , where Q (N) is assumed to 
be a wrapped index array -- see below ( input /output ) . 

- work array dimensioned (5,M), used to store the various 
scale factors. We have: 


SC (1, i) - S (i) / S(i-l) 

SC (2, i) - T (i) / T(i-l) 

SC (3, i) - S (i) / T (i) 

SC (4 , i) - CSI(i) 

SC (5 , i) - SIG(i) 

Note that the scale routine DSCALE expects to receive the 
scale factors in a 5x1 vector as the one described above. 
This routine initializes the first column; thereafter, 
the DLAL routine will update the array (input /output) . 

- work array dimensioned (M,5*M+14), used to store internal 


variables . We have : 


WK ( 
WK ( 

WK ( 

WK ( 
WK ( 
WK ( 
WK ( 


,1:M) - (W_{NK) A T V_{NK>) 

,M+1) to WK(:72*M) 

- (W_{NK) A T V_{NK}) A {-1} 

, 2*M+1) to WK ( : , 3*M) 

“ work array for the SVD routine DSVDC 
,3*M+1) - temporary vector 

,3*M+2) » temporary vector 

,3*M+3) « temporary vector 

, 3*M+4 ) - the saved last column of the matrix 


WK ( : , 3*M+5) 
WK ( : , 3*M+6) 
WK ( : , 3*M+7 ) 


WK ( 


3*M+8 ) 


WK ( 
WK ( 

WK ( 

WK ( 

WK ( 


, 3*M+9) 

, 3*M+10) 

, 3*M+11) 

, 3*M+12) 

, 3*M+13) 


WK ( : , 3*M+14) 
WK ( : , 3*M+15) 


WK ( 


4*M+15) 


(W_{NKM1 } A T VJNKMl}) A {-1) 

- H (NK:NKP1,N) 

- H (NK : NKP1 , N+l ) 

■ the saved part of H (N) , used in case 
the block is restarted 

- the saved part of H(NP1), used in 
case the block is restarted 

- the scale factors OMEGA (wrapped) 

- the cosines of the Givens rotations 
(wrapped) 

- the sines of the Givens rotations 
(wrapped) 

- the rotated right hand side for the 
least squares problem (wrapped) 

= the elements of the row vectors ZNK 
(wrapped) 

- the last column of R_{NKM2 } * { -1 } 
(wrapped) 

to WK ( : , 4*M+14) 

= the matrix Y_NK (wrapped) 

to WK ( : , 5*M+14 ) 

- the matrix R NK (wrapped) 


(input/output ) . 


oooooooooo 
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TOL 


ANORM 


INFO 


- integer array specifying the indices for all the wrapped 
variables (V, W,SC,WK) . To allow the algorithm to run more 
than M steps, these variable wrap around, in that Q(I) is 
the index of the slots where that variables are stored at 
the I-th step. Normally, these indices would be in order, 
basically Q(I) - I MOD M + 1, but the algorithm makes no 
assumptions to this effect. These indices are not checked 
in any way for validity (input) . 

“ vector with the tolerances used in the various checks . We 
have : 


TOL ( 1 ) - upper bound for the range of CSI and SIGMA 
TOL (2) - lower bound for the range of CSI and SIGMA 
TOL (3) = level below which the norms of the Lanczos 
vectors are considered zero (used to check if 
an invariant subspace was found) 

TOL (4) * level below which the singular values of 
(W_{NK} A T V_{NK}) are considered zero 
TOL (5) = the convergence level for the scaled residual 
norm 

Note that the scale routine DSCALE expects to receive the 
tolerances in a 4x1 vector as the one described above. If 
the user provides non-positive values for any of TOL (1:4) 
then the DLAL routine will replace them with defaults, as 
follows : 


TOL (1) - 1/TOL (2 ) 

TOL (2) - eps A { 1/2 } 

T0L(3) - eps A {l/4} 

TOL ( 4 ) - eps^{l/3) 

Here, eps is machine epsilon (input /output) . 

“ user-supplied estimate for the norm of A. If a value less 
than 2.0 is provided, it is replaced with 2.0. On exit, 
it is set to the last value used by the algorithm, which 
updates the norm estimate whenever it is necessary to 
close a block (input /output) . 

“ information passing variable. 

Upon entry, it gives the numbers of the output units used 
to trace execution, and controls the generation of the 
convergence history. Up to three trace output units can 
be spcified: one where the non-zero elements of the upper 
Hessenberg matrix H are sent, another where various trace 
messages with details about the progress of the algorithm 
are sent, and finally a third one, where data about the 
convergence history is sent . The data in H is output in 
groups of numbers, one number per line, as follows: each 
group begins with two integers that specify the starting 
and ending row indices, followed by the actual values, 
reals output in format E25.17. Note that the ending row 
index is one higher than the column index, as H is upper 
Hessenberg. Also note that since H is output before the 
block is checked for forced closure, data for any columns 
in H may appear several times; of course, only the latest 
is valid. The convergence history data is output as up 
to three numbers on a line, the first one an integer, the 
size of the current block, the other two reals, output in 
format E25.17, the first one if which is the upper bound 
used in the convergence tests, and the second one is the 
computed residual norm, when actually computed, or -1.0 
otherwise. To extract the unit numbers for these units, 
INFO - txxyyzz, where xx is the unit number for the data 
for H, and yy is the unit number for the convergence data 
and zz of the unit number for the trace messages. Also, 
if t.NE.O, then the actual residual norm is computed at 
every step. For example: 

INFO = 1106 -=> trace messages are sent to unit 6 , 

convergence data sent to unit 11 
INFO - 1000000 ==> the true residual norm is always 
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computed 

INFO - 5121314 — > always compute true residual norm, 

send trace messages to unit 14, the 
convergence data to unit 13, and H 
to unit 12 
INFO - 0 --> no output. 

Note that INFO must be a 32-bit integer, and that it is 
the responsibility of the caller to ensure that the units 
used are ready for output . 

Upon exit : 

INFO - 0 — > nothing to report, algorithm converged 

INFO < 0 -»> the SVD routine returned this error code 

in DLAL (with positive sign) 

INFO - 1 — > an A-invariant subspace has been found 

INFO - 2 --> an A A T-invariant subspace has been found 

INFO - 3 — > both subspaces have been found 
INFO - 4 ■•“> the last block could not be closed 
INFO - 8 --> algorithm failed to converge after NLIM 

steps 

INFO - 16 — > invalid inputs 

For more details, see the description in the routine DLAL 
(input /output) . 

External routines used: 
subroutine axb(x,b) 

Computes b - A * x. 

subroutine bcksub (ndim, nlen, nstart, a, xb,q) 

Computes xb - inv(a) * xb with a upper triangular (specific to 
our setup, i.e., the columns of a are permuted according to q) . 
subroutine daxpy (n, da, dx, incx, dy, incy) 

Computes dy - da * dx + dy. 
subroutine dcopy (n, dx, incx, dy, incy) 

Computes dy - dx. 

double precision ddot (n, dx, incx, dy, incy) 

Computes the dot product of dx and dy. 
subroutine dial (ndim, nlen,m, n, nk, nkml, vw, sc, wk, q, norms, tol, info) 
Does one step of the look-ahead Lanczos algorithm, 
double precision dnrm2 (n, dx, incx) 

Computes the 2-norm of dx. 
subroutine drot (n, dx, incx, dy, incy, dcos, dsin) 

Applies a Givens rotation to a vector, 
subroutine drotg (da, db, dcos, dsin) 

Computes a Givens rotation, 
subroutine dscal (n, da, dx, incx) . 

Computes dx ■ da * dx. 
subroutine dscale (n, v, w, sc, tol) 

Scales the Lanczos vectors v and w. 
subroutine dzero (n, dx, incx) 

Zeroes out dx. 
double precision getomg(n) 

Computes the scaling factor OMEGA_n. 

Noel M. Nachtigal 
October 24, 1990 

it******************************************************************* 

INTRINSIC FLOAT, MAX, SQRT 

EXTERNAL DDOT, DNRM2, GETOMG 
DOUBLE PRECISION DDOT, DNRM2, GETOMG 

INTEGER INFO, M, NLEN, NLIM, NDIM, Q (NLIM) 

DOUBLE PRECISION ANORM, SC(5,M), TOL(5) 

DOUBLE PRECISION VW (NDIM, 3*M+4) , WK(M,5*M+14) 


C Local variables . 

C 

INTEGER I, J, LNK, LNKM1, LNKM2, N, NK, NKM1, ONK, ONKM1, 0NKM2 
INTEGER HBASE, HF, TF, TRES, VF 

DOUBLE PRECISION DTMP1, DTMP2 , MAXOMG, NORMS (2), RO, SAVRHS 
C 

C Check whether the inputs are valid. 

C 

IF ( (NDIM.LT . 1) .OR. (NLEN . GT . NDIM) .OR. (NLIM.LT.l) .OR. (M.LT.3) ) THEN 
INFO - 16 
RETURN 
END IF 
C 

C Extract the output units HF, TF, and VF from INFO, and the true 

C residual flag TRES. 

C 

TRES - INFO / 1000000 
INFO - INFO - TRES * 1000000 
HF - INFO / 10000 
INFO - INFO - HF * 10000 
TF - INFO / 100 
INFO - INFO - TF * 100 
VF *= INFO 
C 

C Initialize the counters. 

C 

N - 1 
NK - 1 
NKM1 - 0 
C 

C Scale the first pair of Lanczos vectors. 

C 

SC (4,Q (1) ) - 1.0 
SC (5, Q (1) ) - 1.0 
SC (3, Q (1) ) - 1.0 

CALL DSCALE (NLEN, VW (1, Q (1) ) , VW (1 , M+Q (1) ) , SC (1, Q (1) ) , TOL) 

C 

C Check for convergence (already?) . 

C 

INFO - 0 

IF (SC(4,Q(1)) .EQ.-1.0) INFO - INFO + 1 
IF (SC(5,Q(1) ) .EQ.-1.0) INFO - INFO + 2 
IF (INFO.NE . 0) RETURN 
C 

C Set up WK (1,1) . 

C 

WK(1, 1) - DDOT (NLEN, VW(1, Q ( 1) ) ,1, VW (1 ,M+Q (1) ) , 1) 

WK (1, 1) - SC (5, Q (1) ) * SC (4, Q (1) ) * WK(1,1) 

C 

C Set up the first element of the right-hand side. 

C 

WK(Q (1) , 3*M+9) - GETOMG(l) 

WK(Q(1) ,3*M+12) - WK(Q(1) ,3*M+9) *SC(1,Q(1)) 

MAXOMG - 1.0 / WK (Q (1) , 3*M+9) 

C 

C Initialize the norm estimate. It has to be at least 2.0. 

C 

DTMP1 - 2.0 
NORMS (2) - 0.0 
NORMS (1) - MAX (ANORM, DTMP1) 

C 

C Compute and save the initial residual norm in R0. 

C 

R0 - DNRM2 (NLEN, VW(1, Q (1) ) , 1) 

C 

C Iterate . 


c 

10 ONK - NK 

ONKM1 - MAX (1 , NKM1 ) 

C 

C If we have closed a block, save the working variables, in case we 

C need to restart. Also, reset the norm estimator. 

C 

IF (N.EQ.NK) THEN 

CALL DCOPY (N-NKM1+1,WK(1, 3*M+6) , 1,WK(1, 3*M+8) , 1) 

CALL DCOPY (N-0NKM1+1 , WK (1 , 3*M+5) , 1, WK (1, 3*M+7 ) , 1) 

SAVRHS - WK(Q(N) ,3*M+12) 

NORMS ( 2 ) - 0.0 
END IF 
C 

C Check whether we have enough room left in the arrays. 

C 

20 INFO = 0 

IF (N-NKM1+2.GE.M) INFO - 1 

IF ( (INFO. NE.O) .AND. (VF.NE.O) ) THEN 

WRITE (VF,'(A39)') 'Block is maximal, recommending closure.' 
END IF 
C 

C Set ANORM to the current value of the norm estimate . 

C 

ANORM - NORMS (1) 

C 

C Do one step of the Lanczos algorithm. 

C 

CALL DLAL (NDIM, NLEN, M, N, NK, NKM1, VW, SC, WK, Q, NORMS, TOL, VF, INFO) 

C 

C Check the info passing variable. 

C We check whether the DSVDC routine reported errors, whether the 

C block did not close when it was maximal, or whether an invariant 

C subspace was found. 

C 

IF (INFO. LT . 0 ) THEN 
NLIM = N 
RETURN 

ELSE IF (INFO.EQ. 1) THEN 
NLIM - N 
RETURN 

ELSE IF (INFO.EQ. 2) THEN 
NLIM - N 
RETURN 

ELSE IF (INFO.EQ. 1 + 2) THEN 
NLIM - N 
RETURN 

ELSE IF (INFO.EQ. 4) THEN 
N - NK 

IF (VF.NE.O) WRITE (VF,'(A20)') 'Block did not close:' 

C 

C Block did not close, do we have another norm estimate? 

C 

IF (NORMS (2) .EQ. 0.0) THEN 

IF (VF.NE.O) WRITE (VF,'(A47)') 

$ '-■> no new norm estimates available (aborting).' 

NLIM - N 
RETURN 
ELSE 
C 

C Update the norm the block is guaranteed to close now. We then 

C restart the block. 

C 

NORMS (2) =2.0* NORMS (2) 

IF (VF.NE.O) WRITE (VF, ' (A30, E25.17)') 

'==> updating norm estimate to ', NORMS (2) 


$ 


CALL DCOPY (N-NKM1+1 , WK ( 1 , 3*M+8 ) , 1 , WK (1, 3*M+6 ) , 1 ) 

CALL DCOPY (N-0NKM1+1 , WK ( 1, 3*M+7) , 1,WK(1, 3*M+5) , 1) 

NORMS (1) - NORMS (2) 

WK (Q (N) , 3*M+12 ) - SAVRHS 
NORMS (2) - 0.0 
GO TO 20 
END IF 
ELSE 

INFO - 8 
END IF 
C 

C Output H to the trace file. 

C 

IF (HF.NE.O) THEN 

WRITE (11, '(120)') ONKM1 
WRITE (11, ' (120) ' ) N 

WRITE (11, ' (E25. 17) ' ) (WK ( I-ONKM1+1 , 3*M+5) , I-ONKM1 , N) 

END IF 
C 

C Get the next scaling factor OMEGA (NP1) and update MAXOMG. 

C 

WK(Q(N) , 3*M+9) - GETOMG(N) . 

MAXOMG - MAX (MAXOMG, 1. 0/WK (Q (N) ,3*M+9) ) 

C 

C Compute the starting index in H(:,N-1). 

C 

HBASE - MAX (l,ONKMl-l) 

C 

C Multiply the column of H by the OMEGAs and apply all the previous 
C rotations . 

C 

WK (1, 3*M+5) - WK(Q (HBASE) ,3*M+9) * WK(l,3*M+5) 

C 

C If we are beyond the first block, then there is fill-in above the 

C top element in this column of H. This element is saved in ZNK. 

C 

IF (ONKM1.GT.1) THEN 
DTMP1 - 0.0 

CALL DROT (1 , DTMP1, 1, WK (1, 3*M+5) , 1, 

$ WK (Q (ONKM1) ,3*M+10) ,WK(Q(0NKM1) ,3*M+11) ) 

WK (Q (N-l ) , 3*M+13 ) - DTMP1 
END IF 
C 

C Apply the rotations to the rest of the column. 

C 

DO 30 I - ONKM1+1 , N-l 
J - I - ONKM1 

WK(J,3*M+5) = WK(Q(I) ,3*M+9) *WK(J,3*M+5) 

CALL DROT ( 1 , WK ( J, 3*M+5) , 1 , WK ( J+l, 3*M+5) , 1, 

$ WK (Q (I) , 3*M+10) , WK ( Q ( I ) ,3*M+11) ) 

30 CONTINUE 
C 

C H (N) is not reached by the any of the rotations. Multiply it by 

C its OMEGA and compute the rotation for the column. This will also 

C apply it . 

C 

J - N - ONKM1 

WK ( J, 3*M+5) - WK(Q(N) ,3*M+9) * WK(J,3*M+5) 

CALL DROTG (WK( J, 3*M+5) ,WK( J+l, 3*M+5) , 

$ WK(Q(N) ,3*M+10) ,WK(Q(N) ,3*M+11) ) 

C 

C Apply it to the right-hand side as well. 

C 

WK (Q (N) , 3*M+12 ) - 0.0 

CALL DROT ( 1 , WK (Q (N-l ) , 3*M+12 ) , 1, WK (Q (N) , 3*M+12) , 1, 

$ WK (Q (N) , 3*M+10 ) , WK (Q (N) , 3*M+11 ) ) 


oono o ooooo o onoo non o non ooo non non non non 


Extract the next column of Y_{NK} . 

IF (0NK.GT.0NKM1) THEN 

CALL DCOPY (ONK-ONKMl,WK(l,3*M+5) , 1,WK(1,3*M+14+Q(N-1) ) , 1) 

END IF 

Extract the next column of R_{NK} . 

CALL DCOPY (N-ONK, WK (0NK-0NKM1+1, 3*M+5) , 1, WK (1, 4*M+14+Q (N-l) ) , 1) 

Set up the work vector for the next step. 

CALL DCOPY (N-NKM1+1, WK (1, 3*M+6) , 1, WK (1, 3*M+5) , 1) 

If we have closed a block, update the solution. 

IF (NK.NE.ONK) THEN 

Compute the lengths of the blocks . 

LNK - N - ONK 
LNKM1 - ONK - 0NKM1 
LNKM2 - 0NKM1 - 0NKM2 

Zero out P_{NK} R_{NK} . 

DO 40 I - ONK, N-l 

CALL DZERO (NLEN, VW (1 , 2*M+4+Q (I) ) , 1) 

0 CONTINUE 

IF (ONKM1.GT.1) THEN 

Compute the term involving P_{NKM2}. 

IF (LNKM2.EQ.1) THEN 

LNKM2 - 1 — > R_{NKM2 } A {-1} is a lxl matrix, so we first multiply 
ZNK by this scalar, then ( P_{NKM2} R_{NKM2} ) by the result. 

DTMP1 - WK (Q (ONKM1-1 ) , 3*M+14 ) 

DO 50 I - ONK, N-l 

DTMP2 - DTMP1 * WK (Q ( I) , 3*M+13) 

CALL DCOPY (NLEN, VW (1, 2*M+4+Q (ONKM1-1) ) , 1, 

$ VW (1, 2*M+4+Q (I) ) , 1) 

CALL DSCAL (NLEN, -DTMP2, VW(1, 2*M+4+Q (I) ) , 1) 

50 CONTINUE 

ELSE IF (LNK.EQ.l) THEN 

LNK - 1 --> ZNK is a lxl vector, so we first multiply the last 
column of R_{NKM2 } A {-1 } by ZNK first, then ( P_{NKM2} R_{NKM2 } ) 
by the result. 

DTMP1 « WK (Q (ONK) , 3*M+13) 

DO 60 I - ONKM2, ONKM1-1 

DTMP2 - DTMP1 * WK(Q (I) ,3*M+14) 

CALL DAXPY (NLEN, -DTMP2, VW(1, 2*M+4+Q (I) ) , 1, 

$ VW (1 , 2*M+4+Q (ONK) ) , 1) 

60 CONTINUE 

ELSE 

Otherwise, we first multiply ( P_{NKM2} R_{NKM2} ) by the last 
column of R_{NKM2 } * { -1 } , accumulating the result in the first 
column of ( P_{NK} R_ { NK } ), which was zeroed out. 
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DO 70 I - ONKM2 , ONKM1-1 

CALL DAXPY (NLEN, WK (Q (I) , 3*M+14) , VW(1, 2*M+4+Q (I) ) , 1, 

$ VW (1, 2*M+4+Q (ONK) ) , 1) 

CONTINUE 

Then multiply by the ZNK vector, which involves just scaling the 
above column by the appropriate factor and storing it in its slot 
in the array. 

DO 80 I - ONK+1, N-l 

CALL DCOPY (NLEN, VW( 1,2 *M+4+Q (ONK) ) ,1, 

$ VW (1,2 *M+4 +Q ( I ) ) , 1 ) 

CALL DSCAL (NLEN, -WK (Q (I) , 3*M+13) , VW (1, 2*M+4+Q (I) ) , 1 ) 
CONTINUE 

CALL DSCAL (NLEN, -WK (Q (ONK) , 3*M+13) , 

S VW (1, 2*M+4+Q (ONK) ) , 1) 

END IF 
END IF 

IF (ONK.GT.ONKM1) THEN 
Compute the term involving P_{NKM1). 

DO 100 I - ONK, N-l 

Compute the next column of R_ {NKM1 } A { -1 } * Y_{NK} . 

CALL BCKSUB (M, LNKM1 , ONKM1 , WK (1, 4*M+15) , 

S WK(1,3*M+14+Q(I) ) ,Q) 

Multiply ( P_{NKM1) R_{NKM1) ) by the new column and add to the 
appropriate column of ( P_{NK) R_{NK) ) . 

DO 90 J - 1, LNKM1 

CALL DAXPY (NLEN, -WK ( J, 3*M+14+Q (I) ) , 

$ VW (1, 2*M+4+Q (ONKM1+J-1) ) , 1, VW(1, 2*M+4+Q (I) ) ,1) 

CONTINUE 
CONTINUE 

Compute the last column of R_{NKM1) and store it for when it will 
become the last column of R_{NKM2 } . 

CALL DZERO (M, WK(1, 3*M+14+Q (ONK) ) , 1) 

WK (LNKM1, 3*M+14+Q (ONK) ) - 1.0 

CALL BCKSUB (M, LNKM1, ONKM1, WK (1, 4*M+15) , 

$ WK (1, 3*M+14+Q (ONK) ) , Q) 

DO 110 I - ONKM1, ONK-1 

WK(Q(I) ,3*M+14) - WK (I-ONKM1+1, 3*M+14+Q (ONK) ) 

CONTINUE 
END IF 

Now add VJNK) into ( P_{NK) R_{NK} ) . We also use this loop to 
copy the right hand side of the least squares problem to a work 
vector. 

DO 120 I - ONK, N-l 

CALL DAXPY (NLEN, SC (5,Q (I) ) , VW(1,Q(I) ) , 1, 

$ VW (1,2 *M+4 +Q ( I ) ) , 1 ) 

WK (I-ONK+1, 3*M+14+Q (ONK) ) - WK (Q (I) , 3*M+12) 

CONTINUE 

Compute R_{NK} A {-1) * RHS . 


CALL BCKSUB (M, LNK, ONK, WK ( 1 , 4 *M+15 ) , WK (1 , 3*M+14+Q (ONK) ) , Q) 



c 

C Update the solution vector. 

C 

DO 130 I - ONK, N-l 

CALL DAXPY (NLEN, WK (I-ONK+1, 3*M+14+Q (ONK) ) , 

$ VW(l,2*M+4+Q(I) ) , 1 , VW (1, 2*M+4 ) ,1) 

130 CONTINUE 

C 

C Compute the residual norm upper bound. 

C 

DTMP1 - SQRT (FLOAT (N) ) * MAXOMG * ABS (WK (Q (N) , 3*M+12 ) ) / R0 
IF (VF.NE.O) WRITE (VF, ' (A30 , E25 . 17) ' ) ' Upper bound: ' f DTMP1 
C 

C If the upper bound is within one order of magnitude of the target 

C convergence norm, compute the actual scaled residual norm. 

C 

IF ( (TRES.NE.O) .OR. (DTMP1/TOL (5) .LE.10.0)) THEN 
DTMP2 - DTMP1 

CALL AXB (VW (1, 2*M+4) , VW (1, 2*M+1) ) 

DTMP1 -1.0 

CALL DSCAL (NLEN, -DTMP1, VW(1, 2*M+1) , 1) 

CALL DAXPY (NLEN, DTMP1, VW (1 , 2*M+3) , 1, VW (1 , 2*M+1) , 1) 

DTMP1 - DNRM2 (NLEN, VW(1,2*M+1) , 1) / R0 

IF (VF.NE.O) WRITE (VF, ' (A30, E25 . 17) ' ) 'Actual norm:', DTMP1 
IF (TF.NE.O) WRITE (TF, ' ( 110, 2E25 . 17) ' ) N-ONK, DTMP2, DTMP1 
ELSE 

IF (TF.NE.O) WRITE (TF, ' ( 110, 2E25 . 17) ' ) N-ONK, DTMP1, -1.0 
END IF 
C 

C Check for convergence. 

C 

IF (DTMP1 . LE . TOL (5) ) THEN 
INFO - 0 
NLIM - N 
END IF 
C 

C Update ONKM2 . 

C 

ONKM2 - ONKM1 
END IF 
C 

C Iterate up to NLIM steps. 

C 

IF (N.LT.NLIM) GO TO 10 
C 

RETURN 

END 

C 

£★*** ********************************************************** ******** 
c 

SUBROUTINE BCKSUB (NDIM,NLEN,NSTART, A,XB,Q) 

C 

C Purpose: 

c Computes XB - inv(A) * XB, where A is upper triangular. 

C 

C Parameters: 

C NDIM - the dimensioned size of A (input) . 

c NLEN - the actual size of A (input) . 

C NS TART - the starting column index for A (input) . 

— array containing the matrix of interest starting in the 
column NSTART and possibly wrapping around as indicated 
by Q (input) . 

— upon entry, the right hand side vector; upon exit, the 
solution, XB is not wrapped (input /output) . 

= integer array specifying the actual indices for wrapping 
purposes. Q(i) is the true index in A of the ith column 




C of the matrix of interest (input) . 

C 

C Noel M. Nachtigal 

C October 8, 1990 

C 

INTEGER NDIM, NLEN, NS TART, Q <*) 

DOUBLE PRECISION A (NDIM, * ) , XB(*) 

C 

C Local variables . 

C 

INTEGER I, ILAST, ITMP, J 
DOUBLE PRECISION DTMP 
C 

C Do the elimination; the only trick here is to keep track of the 

C wrapped columns of A, i.e., A ( : , J) is stored in A(:,Q(J)). 

C 

ILAST - NS TART + NLEN - 1 

XB (NLEN) - XB (NLEN) / A (NLEN, Q (ILAST) ) 

DO 20 I - I LAST-1 , NSTART, -1 
DTMP - 0.0 

ITMP - I - NSTART + 1 
DO 10 J - 1+1, ILAST 

DTMP - DTMP + XB ( J-NSTART+1 ) * A(ITMP,Q(J)) 

10 CONTINUE 

XB (ITMP) - (XB (ITMP) - DTMP) / A (ITMP, Q (I)) 

20 CONTINUE 
C 

RETURN 

END 

C 



on n o ooooooooooo 


£** *★******★★★★★★****** I************************************************ 

C 

C Copyright (C) 1990, Roland W. Freund and Noel M. Nachtigal 

C All rights reserved. 

C 

C No part of this code may be reproduced, stored in a retrieval 

C system, translated, transcribed, transmitted or distributed in 

C any form or by any means means, manual, electric, electronic, 

C electo-magnetic, mechanical, chemical, optical, photocopying, 

C recording, or otherwise, without the prior explicit written 

C permission of the author (s) or their designated proxies. In no 

C event shall the above copyright notice be removed or altered in 

C any way. 

C 

C This code is provided "as is", without any warranty of any kind, 
C either expressed or implied, including but not limited to, any 

C implied warranty of merchantibility or fitness for any purpose. 

C In no event will any party who distributed the code be liable for 

C damages or for any claim (s) by any other party, including but not 

C limited to, any lost profits, lost monies, lost data or data 

C rendered inaccurate, losses sustained by third parties, or any 

C other special, incidental or consequential damages arising out of 

C the use or inability to use the program, even if the possibility 

C of such damages has been advised against. The entire risk as to 

C the quality, the performance, and the fitness of the program for 
C any particular purpose lies with the party using the code . 

C 

Q ***************************************************************** 

C ANY USE OF THIS CODE CONSTITUES ACCEPTANCE OF THE TERMS OF THE 
C ABOVE STATEMENTS 

Q ***************************************************************** 

c 

c 

C This file contains the coefficient functions used by the Lanczos 

C algorithm in the recursion formulas for the inner vectors. The 

C basic recursions are of the form: 

C 

C V_{N+1) - A * V_N - ZETA_N * V_N - ETA_N * V_{N-1} 

C W_{N+1 } - A A T * W_N - ZETA_N * W_N - ETA_N * W_{N-1} 

C 

C The functions in this file compute the coefficients ZETA_N and 

C ETA_N for the various indices N. 

C 

C 

DOUBLE PRECISION FUNCTION DETA(I) 

Purpose : 

Returns the second scalar in the recursion used for inner vectors 
\theta_{i+l } - ( z - dzeta(i) ) \theta_i - deta(i) \theta_{i-l } . 

Parameters : 

I - the degree of the current polynomial, see above (input) . 

Noel M. Nachtigal 
August 28, 1990 

INTEGER I 

DETA - 1.0 


RETURN 

END 






O o o o oooooooooo 


c 

DOUBLE PRECISION FUNCTION DZETA(I) 

Purpose : 

Returns the first scalar in the recursion used for inner vectors, 
\theta_{i+l} - ( z - dzeta(i) ) \theta_i - deta(i) \theta_{ i-1 } . 

Parameters : 

I - the degree of the current polynomial, as above (input) . 

Noel M. Nachtigal 
August 28, 1990 

INTEGER I 

DZETA - 1.0 

RETURN 
END 
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£**★★*★*★**********+**+**************★*★★***★******★****★******★******* 

c 

C Copyright (C) 1990, Roland W. Freund and Noel M. Nachtigal 

C All rights reserved. 

C 

C No part of this code may be reproduced, stored in a retrieval 

C system, translated, transcribed, transmitted or distributed in 

C any form or by any means means, manual, electric, electronic, 

C elect o-magnetic, mechanical, chemical, optical, photocopying, 

C recording, or otherwise, without the prior explicit written 

C permission of the author (s) or their designated proxies. In no 

C event shall the above copyright notice be removed or altered in 

C any way. 

C 

C This code is provided "as is", without any warranty of any kind, 

C either expressed or implied, including but not limited to, any 

C implied warranty of merchantibility or fitness for any purpose. 

C In no event will any party who distributed the code be liable for 

C damages or for any claim (s) by any other party, including but not 

C limited to, any lost profits, lost monies, lost data or data 

C rendered inaccurate, losses sustained by third parties, or any 

C other special, incidental or consequential damages arising out of 

C the use or inability to use the program, even if the possibility 

C of such damages has been advised against. The entire risk as to 

C the quality, the performance, and the fitness of the program for 

C any particular purpose lies with the party using the code. 

C 

q ***************************************************************** 

C ANY USE OF THIS CODE CONST ITUES ACCEPTANCE OF THE TERMS OF THE 

C ABOVE STATEMENTS 

0 ***************************************************************** 

C 

0 ** ************************************************************ ******** 

This file contains the scaling function GETOMG, which computes 
the scaling factors Omega used in scaling the Hessenberg matrix 
from the least squares problem solved by QMR. 

********************************************************************* 

DOUBLE PRECISION FUNCTION GETOMG (I) 

Purpose : 

Returns the scaling parameter OMEGA (i) . 

Parameters : 

I - the index of the parameter OMEGA (input) . 

Noel M. Nachtigal 
October 7, 1990 

INTEGER I 

GETOMG -1.0 

RETURN 
END 

********************************************************************* 


